How to Connect Google BigQuery

Getting your data to Google BigQuery is an easy task with Dataddo. If you already have your BigQuery account connected to Dataddo, and you only need instructions on how to connect it to a Data Flow, you can skip Step 1 and start from Step 2 right away. Step 3 explains how to configure a Google Big Query Data Flow.

Step 1 - How to create a Google BigQuery Data Destination

  1. Sign in to your Dataddo account and click on Destinations at the top of the page. 
  2. Click on Create Destination in the top right corner.
  3. From the list of destinations, choose Google BigQuery. You can type the name of the connector into the search bar to find it faster.
    Data destinations - selection
  4. Choose your account and click on Next. Google Big Query- step 4a

    NOTE: If you have not authorized your Google Account, click on Authorize. You will be redirected to the si-in page. Once signed in, continue with the steps in the Dataddo.

  5. Fill out all the files on the sign-in page to give DAtaddo all the necessary information to connect your data, and click on Save

    Google Big Query- step 4b

 

Step 2 - How to Create a Google BigQuery Data Flow

  1. Click on Flows at the top of the page. 
  2. Click on Create Flow in the top right corner.
  3. Click on Add Source to connect it to the destination. You can type the connector's name into the search bar to find it faster.
    My Flows - New Flow
  4. Once you select your data source, click on Add Destination.Google Big Query- Flow 4

  5. From the list of destinations, select the destination. You can also type the name into the search bar.
  6. Configure the destination by filling out necessary fields.
    Google Big Query- Flow 6a
    NOTE: Remember to set the synchronization time for the flow with a little delay after the synchronization of your source to let the data load.
  7. You can Name your flow by typing the name into the top.
  8. You can see a Data Preview to make sure of your configuration.
    Flow - preview
  9. Click on Save flow.

Step 3 - How to Configure a Google BigQuery Data Flow

Automatic Configuration

Dataddo will try to automatically create a table when the writing operation is triggered for the first time. Date and time values will be stored as TIMESTAMP type, integers as INTEGER type, floating numerics as FLOAT, and string as a STRING type. If the operation fails, please proceed to Troubleshoot section or continue with a Manual configuration. 

Manual Configuration 

  1. Click on the three dots next to the flow and choose Config.
  2. The window pops up with the configuration details. Follow the instructions to finish the configuration.
    Google Big Query - configuration 2

Congratulations, your new Data Destination & Flow are ready!


Need assistance?

Feel free to contact us or create a ticket and we will help you with the setup. To speed the process of resolving your ticket, make sure you provide us with sufficient information.

  

Troubleshooting

The table was not automatically created

If the flow after its creation turned into a broken state, the table failed to be created. Please click on the three dots next to the flow and choose Display Log to look for the error description. In most cases the problem is one of these:

  • Insufficient permissions to create the table. Make sure that the authorized BigQuery user has at least a WRITER role.
  • The table already exists. Delete the already existing table and restart the flow.

Flow is broken after changing the source

Dataddo in order to maintain data consistency does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge). In case your flow got broken after changing the source, most likely the updated schema does not match the table that was already created. Click on the three dots next to the flow and choose Display Log to look for the error description. In case the data collected in the BigQuery table can be deleted, delete the entire table and reset the flow, Dataddo will attempt to create a new table. In case the data cannot be deleted, try manually adding missing columns.

Experiencing data duplicates

BigQuery is primarily an append-only database, therefore the recommended approach is to use the INSERT write strategy. However, this approach can result in duplicities in your data. In order to avoid that, consider other writing strategies:

  • TRUNCATE INSERT. This strategy removes all the contents in the BigQuery table prior to data insertion. 
  • UPSERT. This strategy inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.

Flow is failing with invalid_grant message

Your BigQuery token has been revoked. Please re-authorize your account.

Flow with UPSERT strategy is failing with invalid_configuration message

The combination of columns that you have chosen does not produce a unique index. Edit a flow and include more columns to the index.


Need assistance?

Feel free to contact us or create a ticket and we will help you with the setup. To speed the process of resolving your ticket, make sure you provide us with sufficient information.