Google BigQuery

Google BigQuery is a cloud-based data warehouse and analytics platform provided by Google Cloud. It uses a columnar storage format, which allows for fast query performance and efficient data compression, making it well-suited for handling large-scale analytical workloads on vast amounts of data.

Prerequisites

You have created a BigQuery project and a dataset.
You authorized one of the following Google BigQuery accounts:
- User account
- Service account (Recommended for larger and more complex use cases.)
Your User or Service account in Google BigQuery has the following IAM project permissions:
- roles/bigquery.dataEditor
- roles/bigquery.jobUser
- roles/bigquery.readSessionUser

Authorize Connection to Google BigQuery

To authorize the connection to Google BigQuery, choose one of the following methods.

Method	Advantage
User account	Simple standard authorization method using your Google account.
Service account	Suitable for more complex deployments.

DATADDO TIP

If you don't have the required permissions to authenticate your service, assign the authorizer role to a team member with these permissions.

User Account Authorization

In Google Cloud Console

To create a user account, follow these steps:

In your Google Cloud Console, select your project and navigate to IAM & Admin and IAM.
Click on + Add to add a new user and set the user's email address.
Assign the following roles to the user:
1. BigQuery Data Editor
2. BigQuery Job User
3. BigQuery Read Session User
Save your configuration.

In Dataddo

Authorization via standard user account is the default option and will follow the standard OAuth 2.0 mechanism.

On the Authorizers tab, click on Authorize New Service.
Select Google BigQuery.
Follow the on-screen prompts to finish the authorization process.

Continue with creating a new Google BigQuery data destination.

Service Account Authorization

In general, for larger and more complex deployments, using a Service Account is recommended.

In Google Cloud Console

To create service account, follow these steps:

In your Google Cloud Console, select your project and navigate to IAM & Admin and Service accounts.
At the top of the page click on + Create Service Account.
Name your service account and in the Service account permissions (optional) section, assign the following roles:
1. BigQuery Data Editor
2. BigQuery Job User
Save your Service Account.
Click on three dots next to your newly created Service Account, choose Manage keys, then Add key.
Choose JSON as key type. This will download the JSON file to your computer which you will later need to upload to Dataddo.

In Dataddo

To authorize your service account, make sure you also have your JSON key.

On the Security page, navigate to the Certificates tab and click on Add Certificate.
Name your certificate, select Google Service Account Key as certificate type and upload the file you have obtained when creating a Service Account for authorization.
On the Authorizers tab, click on Authorize New Service and select Google BigQuery (Service account).
Select the newly added certificate.
Click on Save.

When creating the authorizer, make sure you select Google BigQuery (Service account).

Create a New BigQuery Destination

On the Destinations page, click on the Create Destination button and select the destination from the list.
Select your authorizer from the drop-down menu.
Name your destination and click on Save.

Need to authorize another connection?

Click on Add new Account in drop-down menu during authorizer selection and follow the on-screen prompts. You can also go to the Authorizers tab and click on Add New Service.

Create a Flow to BigQuery

Navigate to Flows and click on Create Flow.
Click on Connect Your Data to add your source(s).
Click on Connect Your Data Destination to add the destination.
Choose the write mode and fill in the other required information.
Check the Data Preview to see if your configuration is correct.
Name your flow and click on Create Flow to finish the setup.

Table Naming Convention

When naming your table, please make sure the table name does NOT contain:

Whitespaces
Dashes

Table Partitioning

Dataddo supports table partitioning, which splits large datasets into smaller, manageable partitions, based on certain criteria, e.g. date.

During flow creation, add date range patterns to your table name:

xyz_{{1d1|Ymd}}

Using this table name, Dataddo will create a new table named xyz every day, e.g. xyz_20xx0101, xyz_20xx0102 etc.

Troubleshooting

Billing Not Enabled

ERROR CODE

Error 403: Billing has not been enabled for this project. Enable billing at https://console.cloud.google.com/billing. Datasets must have a default expiration time and default partition expiration time of less than 60 days while in sandbox mode.

This issue is caused by Google BigQuery billing issue. Please enable your BigQuery billing in your Google Cloud Console.

Column Not Found in Schema Error

ERROR CODE
Column not found in schema: Column ... not found in schema

This issue may be caused by:

Updated schema due to a column in the source being renamed:
- Dataddo generates flow schemas based on the attached sources. If you changed a column name in the source connected to the flow, it will result in a mismatch between the expected target table schema and the current one.
- Either revert the column name in the source to its previous value or adjust the column name in the target table to match the new name.
Changed source within flow that lead to a schema discrepancy: For the sake of data integrity, changes in the source schema are not passed down to downstream database destinations. This means that table schemas won't automatically update without your explicit consent.
1. If you can delete data in the BigQuery table: Delete the table in its entirety and reset the flow. Upon the next execution, Dataddo will attempt to establish a new table.
2. If you cannot delete data in the BigQuery table: Consider manually adding any absent columns directly in Google BigQuery.

Invalid Configuration Error

ERROR CODE
invalid_configuration

This issue is caused by insufficient row-level uniqueness of you composite key. To solve the issue, please follow these steps:

In the Flows tab, click on your flow.
Edit the composite key to include more columns for distinct row identification when using upsert or delete write modes.

Invalid Grant Error

ERROR CODE
invalid_grant

This issue may be caused by:

A revoked or expired token (invalid_grant - reauth related error): Please reauthorize your account and restart your sources.
Insufficient permissions: Make sure that both BigQuery Data Editor and BigQuery Job User roles are assigned to the User account or Service account that you are using.
Session Length Policies not Aligned with Workflows: Session length policies determine how often users are prompted to re-authenticate. Follow these steps to resolve the issue:
1. (Optional) Review the session length policy by refering to Google's official documentation to prevent this issue from happening again. If needed, you can exempt trusted apps from the reauhtentification policies.
2. Reauthorize reauthorize your account and restart your sources.

DATADDO TIP

We highly recommend using a GBQ service account.

Failed to Convert Value GBQ Data Type Error

ERROR CODE
Failed to convert value GBQ data type: Column '...' schema is set to '...'

This issue is caused by data type mismatch between the table in BigQuery and sources attached to the flow. To solve the issue, please follow these steps:

Check the error message to identify which column is affected and change the data type.
Manually load data in the source.
Restart the flow.

Failed to Prepare Table Error

ERROR CODE
Erroneous communication with Google Big Query API: failed to prepare table: Failed to create new table

This issue may be caused by:

Table name already in use: Please edit the flow and choose a different table name.
Insufficient permissions: Your BigQuery account does not have the sufficient permissions. Make sure that both BigQuery Data Editor and BigQuery Job User roles are assigned to your User account or Service account.

Frequent OAuth Reauthorization Required

Google accounts authorized using OAuth 2.0 may require frequent reauthorization. If not anticipated, this may cause for your flows to break unexpectedly.

To solve this issue, please authenticate your Google account using a service account.

Google BigQuery

Prerequisites

Authorize Connection to Google BigQuery

User Account Authorization

In Google Cloud Console

In Dataddo

Service Account Authorization

In Google Cloud Console

In Dataddo

Create a New BigQuery Destination

Create a Flow to BigQuery

Table Naming Convention

Table Partitioning

Troubleshooting

Billing Not Enabled

Column Not Found in Schema Error

Invalid Configuration Error

Invalid Grant Error

Failed to Convert Value GBQ Data Type Error

Failed to Prepare Table Error

Frequent OAuth Reauthorization Required

Related Articles