- 6 Minutes to read
- DarkLight
Google BigQuery
- 6 Minutes to read
- DarkLight
Google BigQuery is a cloud-based data warehouse and analytics platform provided by Google Cloud. It uses a columnar storage format, which allows for fast query performance and efficient data compression, making it well-suited for handling large-scale analytical workloads on vast amounts of data.
Prerequisites
- You have created a BigQuery project and a dataset.
- You authorized one of the following Google BigQuery accounts:
- User account
- Service account (Recommended for larger and more complex use cases.)
- Your User or Service account in Google BigQuery has the following IAM project permissions:
roles/bigquery.dataEditor
roles/bigquery.jobUser
roles/bigquery.readSessions
Authorize Connection to Google BigQuery
To authorize the connection to Google BigQuery, choose one of the following methods.
Method | Advantage |
---|---|
User account | Simple standard authorization method using your Google account. |
Service account | Suitable for more complex deployments. |
If you don't have the required permissions to authenticate your service, assign the authorizer role to a team member with these permissions.
User Account Authorization
In Google Cloud Console
To create a user account, follow these steps:
- In your Google Cloud Console, select your project and navigate to IAM & Admin and IAM.
- Click on + Add to add a new user and set the user's email address.
- Assign the following roles to the user:
BigQuery Data Editor
BigQuery Job User
BigQuery Read Sessions
- Save your configuration.
In Dataddo
Authorization via standard user account is the default option and will follow the standard OAuth 2.0 mechanism.
- On the Authorizers tab, click on Authorize New Service.
- Select Google BigQuery.
- Follow the on-screen prompts to finish the authorization process.
Continue with creating a new Google BigQuery data destination.
Service Account Authorization
In general, for larger and more complex deployments, using a Service Account is recommended.
In Google Cloud Console
To create service account, follow these steps:
- In your Google Cloud Console, select your project and navigate to IAM & Admin and Service accounts.
- At the top of the page click on + Create Service Account.
- Name your service account and in the Service account permissions (optional) section, assign the following roles:
BigQuery Data Editor
BigQuery Job User
- Save your Service Account.
- Click on three dots next to your newly created Service Account, choose Manage keys, then Add key.
- Choose JSON as key type. This will download the JSON file to your computer which you will later need to upload to Dataddo.
In Dataddo
To authorize your service account, make sure you also have your JSON key.
- On the Security page, navigate to the Certificates tab and click on Add Certificate.
- Name your certificate, select Google Service Account Key as certificate type and upload the file you have obtained when creating a Service Account for authorization.
- On the Authorizers tab, click on Authorize New Service and select Google BigQuery (Service account).
- Select the newly added certificate.
- Click on Save.
When creating the authorizer, make sure you select Google BigQuery (Service account).
Create a New BigQuery Destination
- On the Destinations page, click on the Create Destination button and select the destination from the list.
- Select your authorizer from the drop-down menu.
- Name your destination and click on Save.
Click on Add new Account in drop-down menu during authorizer selection and follow the on-screen prompts. You can also go to the Authorizers tab and click on Add New Service.
Create a Flow to BigQuery
- Navigate to Flows and click on Create Flow.
- Click on Connect Your Data to add your source(s).
- Click on Connect Your Data Destination to add the destination.
- Choose the write mode and fill in the other required information.
- Check the Data Preview to see if your configuration is correct.
- Name your flow and click on Create Flow to finish the setup.
Table Naming Convention
When naming your table, please make sure the table name does NOT contain:
- Whitespaces
- Dashes
Table Partitioning
Dataddo supports table partitioning, which splits large datasets into smaller, manageable partitions, based on certain criteria, e.g. date.
During flow creation, add date range patterns to your table name:
xyz_{{1d1|Ymd}}
Using this table name, Dataddo will create a new table named xyz
every day, e.g. xyz_20xx0101
, xyz_20xx0102
etc.
Troubleshooting
Invalid Grant Error
ERROR CODE
invalid_grant
This issue may be caused by:
- Revoked or expired token: Please reauthorize your account.
- Insufficient permissions: Make sure that both BigQuery Data Editor and BigQuery Job User roles are assigned to the User account or Service account that you are using.
Invalid Configuration Error
ERROR CODE
invalid_configuration
This issue is caused by insufficient row-level uniqueness of you composite key. To solve the issue, please follow these steps:
- In the Flows tab, click on your flow.
- Edit the composite key to include more columns for distinct row identification when using upsert or delete write modes.
Failed to Prepare Table Error
ERROR CODE
Erroneous communication with Google Big Query API: failed to prepare table: Failed to create new table
This issue may be caused by:
- Table name already in use: Please edit the flow and choose a different table name.
- Insufficient permissions: Your BigQuery account does not have the sufficient permissions. Make sure that both BigQuery Data Editor and BigQuery Job User roles are assigned to your User account or Service account.
Failed to Convert Value GBQ Data Type Error
ERROR CODE
Failed to convert value GBQ data type: Column '...' schema is set to '...'
This issue is caused by data type mismatch between the table in BigQuery and sources attached to the flow. To solve the issue, please follow these steps:
- Check the error message to identify which column is affected and change the data type.
- Manually load data in the source.
- Restart the flow.
Column Not Found in Schema Error
ERROR CODE
Column not found in schema: Column ... not found in schema
This issue may be caused by:
- Updated schema due to a column in the source being renamed:
- Dataddo generates flow schemas based on the attached sources. If you changed a column name in the source connected to the flow, it will result in a mismatch between the expected target table schema and the current one.
- Either revert the column name in the source to its previous value or adjust the column name in the target table to match the new name.
- Changed source within flow that lead to a schema discrepancy: For the sake of data integrity, changes in the source schema are not passed down to downstream database destinations. This means that table schemas won't automatically update without your explicit consent.
- If you can delete data in the BigQuery table: Delete the table in its entirety and reset the flow. Upon the next execution, Dataddo will attempt to establish a new table.
- If you cannot delete data in the BigQuery table: Consider manually adding any absent columns directly in Google BigQuery.
Billing Not Enabled
ERROR CODE
Error 403: Billing has not been enabled for this project. Enable billing at https://console.cloud.google.com/billing. Datasets must have a default expiration time and default partition expiration time of less than 60 days while in sandbox mode.
This issue is caused by Google BigQuery billing issue. Please enable your BigQuery billing in your Google Cloud Console.
Frequent OAuth Reauthorization Required
Google accounts authorized using OAuth 2.0 may require frequent reauthorization. If not anticipated, this may cause for your flows to break unexpectedly.
To solve this issue, please authenticate your Google account using a service account.