- 5 Minutes to read
- DarkLight
Google BigQuery
- 5 Minutes to read
- DarkLight
Google BigQuery is a cloud-based data warehouse and analytics platform provided by Google Cloud. It uses a columnar storage format, which allows for fast query performance and efficient data compression, making it well-suited for handling large-scale analytical workloads on vast amounts of data.
Prerequisites
- You have created a BigQuery project and a dataset.
- You have the following IAM project permissions:
- roles/bigquery.dataEditor
- roles/bigquery.jobUser
- roles/bigquery.readSessions
Authorize the Connection to Google BigQuery
In Google Cloud Console
Dataddo supports both authorization via standard User account and Service account. Each option has its pros and cons.
In general, for larger and more complex deployments we generally recommend using a Service Account.
Create an User Account for Authorization
- In your Google Cloud Console, select your project and navigate to IAM & Admin and IAM.
- Click on + Add to add a new user and set the user's email address.
- Assign the following roles to the user:
- BigQuery Data Editor
- BigQuery Job User
- BigQuery Read Sessions
- Save your configuration.
Create a Service Account for Authorization
- In your Google Cloud Console, select your project and navigate to IAM & Admin and Service accounts.
- At the top of the page click on + Create Service Account.
- Name your service account and in the Service account permissions (optional) section, assign the following roles:
- BigQuery Data Editor
- BigQuery Job User
- Save your Service Account.
- Click on three dots next to your newly created Service Account, choose Manage keys, then Add key.
- Choose JSON as key type. This will download the JSON file to your computer which you will later need to upload to Dataddo.
In Dataddo
Authorization via User Account
Authorization via standard User Account is the default option and will follow the standard OAuth 2.0 mechanism.
In the Authorizers tab, click on Authorize New Service and select Google BigQuery. Follow the prompts to finish the authorization process.
Authorization via Service Account
Make sure you have created a Service Account for authorization and have downloaded the key in JSON.
- On the Security page, navigate to the Certificates tab and click on Add Certificate.
- Name your certificate, select Google Service Account Key as certificate type and upload the file you have obtained when creating a Service Account for authorization.
- In the Authorizers tab, click on Authorize New Service and select Google BigQuery (Service account).
- Select the newly added certificate.
- Click on Save.
Create a New BigQuery Destination
- Under the Destinations tab, click on the Create Destination button and select the destination from the list.
- Select your account from the drop-down menu.
- Name your destination and click on Save to create your destination.
Click on Add new Account in drop-down menu during authorizer selection and follow the on-screen prompts. You can also go to the Authorizers tab and click on Add New Service.
Create a Flow to BigQuery
- Navigate to Flows and click on Create Flow.
- Click on Connect Your Data to add your sources.
- Click on Connect Your Data Destination to add the destination.
- Choose the Write mode and fill in the other required information.
- Check the Data Preview to see if your configuration is correct.
- Name your flow, and click on Create Flow to finish the setup.
Table Naming Convention
When naming your table, please make sure the table name does NOT contain:
- Whitespaces
- Dashes
Table Partitioning
Dataddo supports table partitioning. If you use, for example,
table_{{1d1|Ymd}}
Dataddo will create a table every day, e.g. table_20xx0101, table_20xx0102 etc.
Troubleshooting
Invalid Grant Error
invalid_grant
This issue may be caused by:
- Revoked or expired token: Please reauthorize your account.
- Insufficient permissions: Make sure that both BigQuery Data Editor and BigQuery Job User roles are assigned to the User account or Service account that you are using.
Invalid Configuration Error
invalid_configuration
This issue is caused by insufficient row-level uniqueness of you composite key. To solve the issue, please follow these steps:
- In the Flows tab, click on your flow.
- Edit the composite key to include more columns for distinct row identification when using upsert or delete write modes.
Failed to Prepare Table Error
Erroneous communication with Google Big Query API: failed to prepare table: Failed to create new table
This issue may be caused by:
- Table name already in use: Please edit the flow and choose a different table name.
- Insufficient permissions: Your BigQuery account does not have the sufficient permissions. Make sure that both BigQuery Data Editor and BigQuery Job User roles are assigned to your User account or Service account.
Failed to Convert Value GBQ Data Type Error
Failed to convert value GBQ data type: Column '...' schema is set to '...'
This issue is caused by data type mismatch between the table in BigQuery and sources attached to the flow. To solve the issue, please follow these steps:
- Check the error message to identify which column is affected and change the data type.
- Manually load data in the source.
- Restart the flow.
Column Not Found in Schema Error
Column not found in schema: Column ... not found in schema
This issue may be caused by:
- Updated schema due to a column in the source being renamed:
- Dataddo generates flow schemas based on the attached sources. If you changed a column name in the source connected to the flow, it will result in a mismatch between the expected target table schema and the current one.
- Either revert the column name in the source to its previous value or adjust the column name in the target table to match the new name.
- Changed source within flow that lead to a schema discrepancy: For the sake of data integrity, changes in the source schema are not passed down to downstream database destinations. This means that table schemas won't automatically update without your explicit consent.
- If you can delete data in the BigQuery table: Delete the table in its entirety and reset the flow. Upon the next execution, Dataddo will attempt to establish a new table.
- If you cannot delete data in the BigQuery table: Consider manually adding any absent columns directly in Google BigQuery.