- 4 Minutes to read
How to Create a Data Flow
- 4 Minutes to read
- You already have a Dataddo account. If you do not have an account already, sign up for 14-day trial.
- You have already added a source.
- You have already added a destination.
Creating a New Data Flow to a Destination
Since Dataddo de-couples the process of data extraction and data ingestion, you can create M:N associations between Sources and Destinations using Flows. Please proceed with the following points to create a data pipeline and then check Finalizing the connection if any specific configuration in the actual destination is needed to make the flow working.
- Click on Flows at the top of the page and proceed with Create Flow in the top right corner.
- Click on Add Source and choose the source you'd like to include in the flow. You can type the source or connector's name into the search bar to find it faster. You can add more sources or blend two sources in the same flow. Read more about Data Blending here.
- Once you select your data source, click on Add Destination, followed by selecting the destination of your choice.
- You can Name your flow by typing the name into the top.
- You can see a Data Preview to make sure of your configuration.
- Click on Save flow.
Finalizing the connection
In general, Dataddo will always do the best to remove all the barriers for creating the connections. Therefore most of the configuration steps at the destination side (e.g. creating tables, schemas, setting data types...) is automated.
Once flow to the dashboarding applications is created, you will be presented with a step-by-step guide how to finalize the connection, so please follow the steps carefully to set everything up. The window with configuration information can be also manually triggered by clicking on the three dots next to the flow and choosing Config. Further details can be found in destination-specific pages:
Once the flow to the database, datawarehouse or data lake is created, all the provisioning (e.g. creating tables, schemas, setting data types, creating pemissions for the tables...) is automated and will be triggered during the first data load. You can check the progress by by clicking on the three dots next to the flow and choosing Show logs. After all the operations succeed, you will see the notification in the notification bar. You can also click on the three dots next to the flow and choose Show logs for more details.
Once the flow to the application created, all the provisioning is automated and will be triggered during the first data load. You can check the progress by by clicking on the three dots next to the flow and choosing Show logs. After all the operations succeed, you will see the notification in the notification bar. You can also click on the three dots next to the flow and choose Show logs for more details.
Database Table Was Not Automatically Created
If the flow to the database or data warehouse is in a broken state right after the creation, most commonly the databases table failed to be created. Click on the three dots next to the flow and select Shows Logs to look for the error description. In most cases the problem is one of these:
- Insufficient permissions to create the table. Make sure that the user you have provided when you added the destination has permissions for creating the table.
- The table already exists. Delete the already existing table and restart the flow by clicking on Manual data insert.
Flow Is Broken after Changing the Source
In order to maintain data consistency, Dataddo does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge).
If your flow breaks after changing the source, the updated schema most likely does not match the table that was already created. Click on the three dots next to the flow and choose Display Log to look for the error description. Delete the existing table in your database and reset the flow. Dataddo will attempt to create a new table. If the table cannot be deleted, manually add the missing columns to your existing table.
Experiencing Data Duplicates
For destinations that are primarily append-only, the recommended approach is to use the INSERT write strategy. However, this approach can result in duplicities in your data. To avoid that, consider other writing strategies:
- TRUNCATE INSERT. This strategy removes all the contents in the BigQuery table prior to data insertion.
- UPSERT. This strategy inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.
Flow with UPSERT Strategy Is Failing with invalid_configuration Message
The combination of columns that you have chosen does not produce a unique index. Edit the flow and include more columns to the index.