How to Create a Data Flow
  • 3 Minutes to read
  • Dark
    Light

How to Create a Data Flow

  • Dark
    Light

Getting your data from a source to a destination is an easy task with Dataddo. To create a flow, first, you need to create a source and destination. Once both are connected, you can create a flow.

Some connectors have some specific steps during configuration. Search the name of the destination in the search bar above for more information.

Create a New Data Flow

  1. Click on Flows at the top of the page.

  2. Click on Create Flow in the top right corner.

  3. Click on Add Source to connect it to the destination. You can type the connector's name into the search bar to find it faster.
    My Flows - New Flow

  4. Once you select your data source, click on Add Destination.
    Vertica- Flow 4

  5. From the list of destinations, select the destination. You can also type the name into the search bar.
    Firebolt- Flow 5

  6. Configure the destination by filling out necessary fields.
    Firebolt- Flow 6 configuration

  7. You can Name your flow by typing the name into the top.

  8. You can see a Data Preview to make sure of your configuration.
    Flow - preview

  9. Click on Save flow.

How to Configure the Data Flow

Automatic Configuration for Databases

Dataddo will try to automatically create a table when the writing operation is triggered for the first time. Date and time values will be stored as TIMESTAMP type, integers as INTEGER type, floating numerics as FLOAT, and string as a STRING type. If the operation fails, please proceed to Troubleshoot section or continue with a Manual configuration.

Manual Configuration

  1. Click on the three dots next to the flow and choose Config.
    Google Big Query - configuration 1

  2. The window pops up with the configuration details. Follow the instructions to finish the configuration. 
    If you selected a Dashboarding App, the system will generate the parameters to connect with. Click on the Dashboarding App Section to see them.
    Google Data Studio- Flow configuration 2

If you chose a Data warehouse or storage, you will see instructions to set up your table. After that, your data flow will be live. 
Google Big Query - configuration 2

Editing a Flow

In case you make changes in your flow that would affect the database schema (field name, add/delete column(s), and changing a field data type), you need to go to your database and delete the table previously created. Then you can save the changes and refresh them.

Troubleshooting

The Table Was Not Automatically Created

Applies to SQL databases: MySQL, Azure SQL, Universal SQL Server, Vertica, Snowflake, CockroachDB, AWS Redshift, AWS RDS (MySQL), AWS RDS (SQL Server), AWS RDS (PostgreSQL), BigQuery, Google Cloud SQL (MySQL), Google Cloud SQL (PgSQL), Universal PostgreSQL, Universal MySQL.

If the flow after its creation turned into a broken state, the table failed to be created. Click on the three dots next to the flow and choose Display Log to look for the error description. In most cases the problem is one of these:

  • Insufficient permissions to create the table. Make sure that the authorized user has at least a WRITER role.
  • The table already exists. Delete the already existing table and restart the flow.
  • Check the flow configuration.

Flow Is Broken after Changing the Source

Applies to SQL databases

Dataddo in order to maintain data consistency does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge).

In case your flow got broken after changing the source, most likely the updated schema does not match the table that was already created. Click on the three dots next to the flow and choose Display Log to look for the error description. In case the data collected in the database table can be deleted, delete the entire table and reset the flow and Dataddo will attempt to create a new table. In case the data cannot be deleted, try manually adding missing columns.

Experiencing Data Duplicates

For destinations that are primarily append-only, the recommended approach is to use the INSERT write strategy. However, this approach can result in duplicities in your data. To avoid that, consider other writing strategies:

  • TRUNCATE INSERT. This strategy removes all the contents in the BigQuery table prior to data insertion.
  • UPSERT. This strategy inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.

Flow with UPSERT Strategy Is Failing with invalid_configuration Message

The combination of columns that you have chosen does not produce a unique index. Edit a flow and include more columns to the index.


Need assistance?

Feel free to contact us and we will help you with the setup. To speed the process of resolving your issue, make sure you provide us with sufficient information.


Was this article helpful?