How to Create a Data Flow
  • 3 Minutes to read
  • Dark
    Light

How to Create a Data Flow

  • Dark
    Light

Getting your data from a source to a destination is an easy task with Dataddo. Create a flow to connect your source to your destination.

Some destination connectors have specific steps during configuration. Search the name of the destination in the search bar above for more information.

Create a New Data Flow

  1. Click on Flows at the top of the page.

  2. Click on Create Flow in the top right corner.

  3. Click on Add Source and select a source that has already been created or create a new source.
    My Flows - New Flow

  4. Once you select your data source, click on Add Destination. Select a destination or create a new one.
    Vertica- Flow 4

  5. Configure the destination by filling out necessary fields.
    Firebolt- Flow 6 configuration

  6. You can Name your flow by typing the name into the top.

  7. Check your Data Preview to ensure correct configuration of your flow.
    Flow - preview

  8. Click on Save flow.

How to Configure the Data Flow

Automatic Configuration for Databases

Dataddo will try to automatically create a table when the writing operation is triggered for the first time. Date and time values will be stored as TIMESTAMP type, integers as INTEGER type, floating numerics as FLOAT, and string as a STRING type. If the operation fails, please proceed to the troubleshoot section or continue with manual configuration.

Manual Configuration

  1. Click on the three dots next to the flow and choose Config.
    Google Big Query - configuration 1

  2. Follow the instructions on the pop-up window to finish the configuration. 
    If you selected a Dashboarding App, the system will generate the parameters to connect with. Click on the dashboarding app section to see them.
    Google Data Studio- Flow configuration 2

If you chose a data warehouse or storage, you will see instructions to set up your table. After that, your data flow will be live. 
Google Big Query - configuration 2

Editing a Flow

In case you make changes in your flow that would affect the database schema (field name, add/delete column(s), or changing a field data type), you need to go to your database and delete the table previously created. Then you can save the changes and refresh them.

Troubleshooting

The Table Was Not Automatically Created

Applies to SQL databases: MySQL, Azure SQL, Universal SQL Server, Vertica, Snowflake, CockroachDB, AWS Redshift, AWS RDS (MySQL), AWS RDS (SQL Server), AWS RDS (PostgreSQL), BigQuery, Google Cloud SQL (MySQL), Google Cloud SQL (PgSQL), Universal PostgreSQL, Universal MySQL.

If the flow is in a broken state after creation, the table failed to be created. Click on the three dots next to the flow and select Display Log to look for the error description. In most cases the problem is one of these:

  • Insufficient permissions to create the table. Make sure that the authorized user has at least a WRITER role.
  • The table already exists. Delete the already existing table and restart the flow.
  • Check the flow configuration.

Flow Is Broken after Changing the Source

Applies to SQL databases

In order to maintain data consistency, Dataddo does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge).

If your flow breaks after changing the source, the updated schema most likely does not match the table that was already created. Click on the three dots next to the flow and choose Display Log to look for the error description. Delete the existing table in your database and reset the flow. Dataddo will attempt to create a new table. If the table cannot be deleted, manually add the missing columns to your existing table.

Experiencing Data Duplicates

For destinations that are primarily append-only, the recommended approach is to use the INSERT write strategy. However, this approach can result in duplicities in your data. To avoid that, consider other writing strategies:

  • TRUNCATE INSERT. This strategy removes all the contents in the BigQuery table prior to data insertion.
  • UPSERT. This strategy inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.

Flow with UPSERT Strategy Is Failing with invalid_configuration Message

The combination of columns that you have chosen does not produce a unique index. Edit the flow and include more columns to the index.


Need assistance?

Feel free to contact us and we will help you with the setup. To speed the process of resolving your issue, make sure you provide us with sufficient information.


Was this article helpful?