- 4 Minutes to read
How to Connect Google BigQuery
- 4 Minutes to read
- You already have a BigQuery account with an existing project and dataset.
- You need to have the following IAM project permissions:
Connecting to BigQuery using OAuth 2.0
Authorization via OAuth 2.0 is a default option. Simply navigate to Connected Services, Add new Service, choose Google BigQuery and follow the authorization process.
Connecting to BigQuery using Service Account
If you are trying to connect Google Big Query with a service account insted of your Google Account, before creating a new destination, you need to first authorize thee service account.Please follow these steps:
- Go to Certificates & Tokens in Dataddo
- Add your certificate (the type should be RSA Private Key)
- Go to Connected Services and add a new service called GBQ (Service account)
- Select the certificate and label the service
- Go to Destinations and Create Destination, and select Google Big Query from the list. (continue as per the steps below)
Creating a Google BigQuery Data Destination
- Click on Destinations, Create Destination and choose Google BigQuery.
- Choose your account and click on Next.
If you have not authorized your Google Account, click on Authorize. By default you will be redirected to OAuth 2.0 authorization process. In case you need to add a Google Service Account please follow Connecting to BigQuery using Service Account.
- You will be asked to select Project and Dataset fields. The values in selectboxes will be automatically populated from your account. After successful flow setup, Dataddo will automatically attempt to create table in selected project and dataset.
Creating a Google BigQuery Data Flow
- Click on Flows followed by Create Flow.
- Choose source that you want to use in the flow.
- Once you selected data source, click on Add Destination.
- From the list of destinations, find a desired BigQuery destination.
- You will be asked to fill in Table name and Write Mode. Dataddo will automatically attempt to provision the table using this name after the flow is saved. To find more details about write modes please follow this link.
By default the writing operation to BigQuery will be triggered after the extraction operation is finished. You can set a custom schedule if needed.
- You can Name your flow by typing the name into the top.
- You can see a Data Preview to make sure that your configuration is correct.
- Click on Save flow.
Checking a Google BigQuery table configuration
Dataddo will try to automatically create a table when the writing operation is triggered for the first time. Date and time values will be stored as TIMESTAMP type, integers as INTEGER type, floating numerics as FLOAT, and string as a STRING type. If the operation fails, please proceed to Troubleshoot section or continue with a Manual configuration.
Click on the three dots next to the flow and choose Config.
The window pops up with the configuration details. Follow the instructions to finish the configuration.
Congratulations, your new Data Destination & Flow are ready!
The table was not automatically created
If the flow after its creation turned into a broken state, the table failed to be created. Please click on the three dots next to the flow and choose Display Log to look for the error description. In most cases the problem is one of these:
- Insufficient permissions to create the table. Make sure that the authorized BigQuery user has at least a WRITER role.
- The table already exists. Delete the already existing table and restart the flow.
Flow is broken after changing the source
Dataddo in order to maintain data consistency does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge). In case your flow got broken after changing the source, most likely the updated schema does not match the table that was already created. Click on the three dots next to the flow and choose Display Log to look for the error description. In case the data collected in the BigQuery table can be deleted, delete the entire table and reset the flow, Dataddo will attempt to create a new table. In case the data cannot be deleted, try manually adding missing columns.
Experiencing data duplicates
BigQuery is primarily an append-only database, therefore the recommended approach is to use the INSERT write strategy. However, this approach can result in duplicities in your data. In order to avoid that, consider other writing strategies:
- TRUNCATE INSERT. This strategy removes all the contents in the BigQuery table prior to data insertion.
- UPSERT. This strategy inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.
Flow is failing with invalid_grant message
Your BigQuery token has been revoked. Please re-authorize your account.
Flow with UPSERT strategy is failing with invalid_configuration message
The combination of columns that you have chosen does not produce a unique index. Edit a flow and include more columns to the index.