Google BigQuery
  • 5 Minutes to read
  • Dark
    Light

Google BigQuery

  • Dark
    Light

Article Summary

Google BigQuery is a cloud-based data warehouse and analytics platform provided by Google Cloud. It uses a columnar storage format, which allows for fast query performance and efficient data compression, making it well-suited for handling large-scale analytical workloads on vast amounts of data.

Prerequisites

  • You have created a BigQuery project and a dataset.
  • You have created a user with the following IAM project permissions:
    • roles/bigquery.dataEditor
    • roles/bigquery.jobUser
    • roles/bigquery.readSessions

Authorize Connection to Google BigQuery

In Google Cloud Console

Dataddo supports both authorization via standard User account and Service account. Each option has its pros and cons.

In general, for larger and more complex deployments we generally recommend using a Service Account.

Create an User Account for Authorization

  1. In your Google Cloud Console, select your project and navigate to IAM & Admin and IAM.
  2. Click on + Add to add a new user and set the user's email address.
  3. Assign the following roles to the user:
    1. BigQuery Data Editor
    2. BigQuery Job User
    3. BigQuery Read Sessions
  4. Save your configuration.

Create a Service Account for Authorization

  1. In your Google Cloud Console, select your project and navigate to IAM & Admin and Service accounts.
  2. At the top of the page click on + Create Service Account.
  3. Name your service account and in the Service account permissions (optional) section, assign the following roles:
    1. BigQuery Data Editor
    2. BigQuery Job User
  4. Save your Service Account.
  5. Click on three dots next to your newly created Service Account, choose Manage keys, then Add key.
  6. Choose JSON as key type. This will download the JSON file to your computer which you will later need to upload to Dataddo.

In Dataddo

Authorization via User Account

Authorization via standard User Account is the default option and will follow the standard OAuth 2.0 mechanism.

In the Authorizers tab, click on Authorize New Service and select Google BigQuery. Follow the prompts to finish the authorization process.

Authorization via Service Account

Make sure you have created a Service Account for authorization and have downloaded the key in JSON.

  1. On the Security page, navigate to the Certificates tab and click on Add Certificate.
  2. Name your certificate, select Google Service Account Key as certificate type and upload the file you have obtained when creating a Service Account for authorization.
  3. In the Authorizers tab, click on Authorize New Service and select Google BigQuery (Service account).
  4. Select the newly added certificate.
  5. Click on Save.

Create a New BigQuery Destination

  1. On the Destinations page, click on the Create Destination button and select the destination from the list.
  2. Select your account from the drop-down menu.
  3. Name your destination and click on Save to create your destination.
Need to authorize another connection?

Click on Add new Account in drop-down menu during authorizer selection and follow the on-screen prompts. You can also go to the Authorizers tab and click on Add New Service.

Create a Flow to BigQuery

  1. Navigate to Flows and click on Create Flow.
  2. Click on Connect Your Data to add your sources.
  3. Click on Connect Your Data Destination to add the destination.
  4. Choose the write mode and fill in the other required information.
  5. Check the Data Preview to see if your configuration is correct.
  6. Name your flow, and click on Create Flow to finish the setup.

Table Naming Convention

When naming your table, please make sure the table name does NOT contain:

  • Whitespaces
  • Dashes

Table Partitioning

Dataddo supports table partitioning, which splits large datasets into smaller, manageable partitions, based on criteria like date.

During flow creation, add date range patterns to your table name:

xyz_{{1d1|Ymd}}

Using this table name, Dataddo will create a new table named xyz every day, e.g. xyz_20xx0101, xyz_20xx0102 etc.


Troubleshooting

Invalid Grant Error

ERROR CODE
invalid_grant

This issue may be caused by:

  1. Revoked or expired token: Please reauthorize your account.
  2. Insufficient permissions: Make sure that both BigQuery Data Editor and BigQuery Job User roles are assigned to the User account or Service account that you are using.

Invalid Configuration Error

ERROR CODE
invalid_configuration

This issue is caused by insufficient row-level uniqueness of you composite key. To solve the issue, please follow these steps:

  1. In the Flows tab, click on your flow.
  2. Edit the composite key to include more columns for distinct row identification when using upsert or delete write modes.

Failed to Prepare Table Error

ERROR CODE
Erroneous communication with Google Big Query API: failed to prepare table: Failed to create new table

This issue may be caused by:

  1. Table name already in use: Please edit the flow and choose a different table name.
  2. Insufficient permissions: Your BigQuery account does not have the sufficient permissions. Make sure that both BigQuery Data Editor and BigQuery Job User roles are assigned to your User account or Service account.

Failed to Convert Value GBQ Data Type Error

ERROR CODE
Failed to convert value GBQ data type: Column '...' schema is set to '...'

This issue is caused by data type mismatch between the table in BigQuery and sources attached to the flow. To solve the issue, please follow these steps:

  1. Check the error message to identify which column is affected and change the data type.
  2. Manually load data in the source.
  3. Restart the flow.

Column Not Found in Schema Error

ERROR CODE
Column not found in schema: Column ... not found in schema

This issue may be caused by:

  1. Updated schema due to a column in the source being renamed:
    • Dataddo generates flow schemas based on the attached sources. If you changed a column name in the source connected to the flow, it will result in a mismatch between the expected target table schema and the current one.
    • Either revert the column name in the source to its previous value or adjust the column name in the target table to match the new name.
  2. Changed source within flow that lead to a schema discrepancy: For the sake of data integrity, changes in the source schema are not passed down to downstream database destinations. This means that table schemas won't automatically update without your explicit consent.
    1. If you can delete data in the BigQuery table: Delete the table in its entirety and reset the flow. Upon the next execution, Dataddo will attempt to establish a new table.
    2. If you cannot delete data in the BigQuery table: Consider manually adding any absent columns directly in Google BigQuery.

Billing Not Enabled

ERROR CODE

Error 403: Billing has not been enabled for this project. Enable billing at https://console.cloud.google.com/billing. Datasets must have a default expiration time and default partition expiration time of less than 60 days while in sandbox mode.

This issue is caused by Google BigQuery billing issue. Please enable your BigQuery billing in your Google Cloud Console.


Was this article helpful?