Google BigQuery
  • 4 Minutes to read
  • Dark
    Light

Google BigQuery

  • Dark
    Light

Article Summary

Prerequisites

  • You have a BigQuery account with an existing project and dataset.
  • You have the following IAM project permissions:
    • roles/bigquery.dataEditor
    • roles/bigquery.jobUser

Account authorization

Connecting to BigQuery Using OAuth 2.0

Authorization via OAuth 2.0 is the default option. Simply navigate to Authorizers, click on Authorize New Service, and choose Google BigQuery. Follow the promts to finish the authorization process.

Connecting to BigQuery Using Service Account

If you are trying to connect Google BigQuery via service account instead of your Google Account, before creating a new destination, first authorize the service accounts. Please follow these steps:

  1. Go to Certificates & Tokens in Dataddo to add your certificate (the certificate type should be RSA Private Key).
  2. Go to Authorizers and add a new service called GBQ (Service account).
  3. Select the newly added certificate and name your service.

Creating a Google BigQuery Data Destination

  1. Go to Destinations, click on Create Destination, and choose BigQuery.
  2. Choose your account and click on Next.

If you have not authorized your Google Account, click on Authorize. You will be automatically redirected to the OAuth 2.0 authorization process. In case you need to add a Google Service Account please follow Connecting to BigQuery using Service Account.

  1. Finally, select your project and dataset.

Once you create your data flow, Dataddo will automatically create a table in your selected project and dataset.
Google Big Query- step 4b

Creating a Google BigQuery Data Flow

  1. Go to Flows and click on Create Flow.
  2. Click on Connect Your Data to add your data source.
  3. Click on Connect Your Data Destination and select your Google BigQuery destination.
  4. You will be asked to fill in table name and choose a Write Mode.

Google Big Query- Flow 6a

By default, the writing operation to BigQuery will be triggered after the extraction is finished. If needed, cou can set a custom schedule.

  1. Check if your configuration is correct by taking a look at the Data Preview tab.
  2. Finally, click on Create Flow.

Checking Your Google BigQuery Table Configuration

Automatic Configuration

When the writing operation is triggered for the first time, Dataddo will automatically create a new table with the name you've chosen.

Date and time values will be stored as TIMESTAMP data type, integers as INTEGER, floating numerics as FLOAT, and string as a STRING data type.

If the operation fails, please proceed to the Troubleshooting section or continue with manual configuration.

Manual Configuration

  1. Click on the three dots next to your data flow and choose Config.
  2. A window with the configuration details will pop up. Follow the instructions to finish the configuration.
    Google Big Query - configuration 2

Congratulations, your new data destination & data flow are ready!


Troubleshooting

The Table Was Not Automatically Created

If your flow turned into a broken state right after being created, the table failed to be created. Please click on the three dots next to your flow and choose Show Logs to look for the error description. In most cases the problem is one of these:

  • Insufficient permissions to create the table. Make sure that the authorized BigQuery user has at least a WRITER role.
  • The table already exists. Delete the already existing table and restart the flow.

Flow Broke After Changing the Source

In order to maintain data consistency, changes done at the flow level are not automatically transmitted to the downstream database destinations. For example, table schemas are not automatically updated without your knowledge.

If your flow broke after you've changed your source, it is likely that the new table schema does not match the one in your destination. For more detail, click on the three dots next to the broken flow, and select Show Logs.

In case

  1. the data collected in the BigQuery table can be deleted, delete the entire table and reset the flow. Dataddo will attempt to create a new table.
  2. the data cannot be deleted, try manually adding missing columns.

Data Duplicates

BigQuery is primarily an append-only database, therefore the recommended approach is to use INSERT write mode.

INSERT, however, may result in duplicates in your data. To avoid that, you can also use:

  • TRUNCATE INSERT. This write mode removes all the content in the BigQuery table prior to data insertion.
  • UPSERT. This write mode inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.

Common error messages

invalid_grant

Your BigQuery token has been revoked. Please re-authorize your account.

invalid_configuration

When using UPSERT write mode, the combination of columns that you have chosen does not produce a unique index. Edit a flow and include more columns to the index.

Erroneous communication with Google Big Query API: failed to prepare table: Failed to create new table

The big query table name is already used. Please edit the flow and choose a different table name.

Failed to convert value GBQ data type

Affected column data type is incompatible with the data type in your destination. Please use of these two options:

  • Change the data type of the affected column by navigating to your source, selecting Edit and going to the Data Types tab. Change column’s data type to date.
  • Change the data type in big query table to integer.

Need assistance?

Feel free to contact us and we will help you with the setup. To speed the process of resolving your issue, make sure you provide us with sufficient information.


Was this article helpful?