Getting your data to Google BigQuery is an easy task with Dataddo. If you already have your BigQuery account connected to Dataddo, and you only need instructions on how to connect it to a Data Flow, you can skip Step 1 and start from Step 2 right away. Step 3 explains how to configure a Google Big Query Data Flow.
Step 1 - How to create a Google BigQuery Data Destination
- Sign in to your Dataddo account and click on Destinations at the top of the page.
- Click on Create Destination in the top right corner.
- From the list of destinations, choose Google BigQuery. You can type the name of the connector into the search bar to find it faster.
Choose your account and click on Next.
NOTE: If you have not authorized your Google Account, click on Authorize. You will be redirected to the si-in page. Once signed in, continue with the steps in the Dataddo.
Fill out all the files on the sign-in page to give DAtaddo all the necessary information to connect your data, and click on Save.
Step 2 - How to Create a Google BigQuery Data Flow
- Click on Flows at the top of the page.
- Click on Create Flow in the top right corner.
- Click on Add Source to connect it to the destination. You can type the connector's name into the search bar to find it faster.
Once you select your data source, click on Add Destination.
- From the list of destinations, select the destination. You can also type the name into the search bar.
- Configure the destination by filling out necessary fields.
NOTE: Remember to set the synchronization time for the flow with a little delay after the synchronization of your source to let the data load.
- You can Name your flow by typing the name into the top.
- You can see a Data Preview to make sure of your configuration.
- Click on Save flow.
Step 3 - How to Configure a Google BigQuery Data Flow
Dataddo will try to automatically create a table when the writing operation is triggered for the first time. Date and time values will be stored as TIMESTAMP type, integers as INTEGER type, floating numerics as FLOAT, and string as a STRING type. If the operation fails, please proceed to Troubleshoot section or continue with a Manual configuration.
- Click on the three dots next to the flow and choose Config.
- The window pops up with the configuration details. Follow the instructions to finish the configuration.
Congratulations, your new Data Destination & Flow are ready!
The table was not automatically created
If the flow after its creation turned into a broken state, the table failed to be created. Please click on the three dots next to the flow and choose Display Log to look for the error description. In most cases the problem is one of these:
- Insufficient permissions to create the table. Make sure that the authorized BigQuery user has at least a WRITER role.
- The table already exists. Delete the already existing table and restart the flow.
Flow is broken after changing the source
Dataddo in order to maintain data consistency does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge). In case your flow got broken after changing the source, most likely the updated schema does not match the table that was already created. Click on the three dots next to the flow and choose Display Log to look for the error description. In case the data collected in the BigQuery table can be deleted, delete the entire table and reset the flow, Dataddo will attempt to create a new table. In case the data cannot be deleted, try manually adding missing columns.
Experiencing data duplicates
BigQuery is primarily an append-only database, therefore the recommended approach is to use the INSERT write strategy. However, this approach can result in duplicities in your data. In order to avoid that, consider other writing strategies:
- TRUNCATE INSERT. This strategy removes all the contents in the BigQuery table prior to data insertion.
- UPSERT. This strategy inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.
Flow is failing with invalid_grant message
Your BigQuery token has been revoked. Please re-authorize your account.
Flow with UPSERT strategy is failing with invalid_configuration message
The combination of columns that you have chosen does not produce a unique index. Edit a flow and include more columns to the index.