- 5 Minutes to read
- DarkLight
Data Flows Overview
- 5 Minutes to read
- DarkLight
A data flow is the connection between a data source or sources and a data destination.
Key considerations for data flows include the following:
One Dataset, One Flow
- For fixed-schema connectors like HubSpot Analytics, each dataset must have its own flow. This means that if you need to extract multiple datasets, such as contacts and deals, you’ll need a separate flow for each.
- If multiple accounts share the same schema (i.e., identical attributes and metrics), their data can be combined into a single flow using data union. For example, data from multiple Google Analytics accounts can be consolidated and extracted together.
Creating a Data Flow
Dataddo de-couples the data extraction and data ingestion processes allowing more architectural flexibility. When you create a data flow, you create M:N relationships between sources and destinations.
Flexibility provided by data flows facilitates a broad spectrum of data architecture patterns and implementations, such as:
- Straightforward data integration into dashboards and BI tools
- Batch data ingestion through ETL/ELT processes into data warehouses
- Moving large datasets into data lakes
- Replicating databases
- Data activation via reverse ETL
Types of Data Flow
Batch Data Flow
Standard and default type of the flow.
Streaming Data Flow
When a Flow is configured to use streaming, the transfer of larger amounts of data is usually considerably faster. Additionally, with the Streaming Flow, you have the absolute guarantee that no data will be cached during the process. However, this also means that features requiring caching, such as flow-level data transformations, cannot be used.
Data Flow Features
To further enhance flexibility and facilitate data management, Dataddo offers multiple features that can be configured on the flow level. These features can be divided in the following categories:
Data Transformation Capabilities
Dataddo's multi-layered approach to transformations ensures data is consistently analytics-ready from its origin to final destination.
Dataddo offers data transformation on the extraction-level (which are appplied automatically) and flow-level. This structured method significantly reduces the need for extensive transformations at the data warehouse or other destinations.
Data Quality Measures
To make sure data quality is prioritized, Dataddo allows the configuration of specific business rules to either halt data writing operations or send notifications when discrepancies are detected.
This proactive approach ensures that only high-quality data is funneled into downstream systems, be it a data warehouse or a BI dashboard, thus safeguarding the reliability of insights derived.
Data Backfilling
In cases of interruptions or changes, some data might be missing. To address such gaps, Dataddo offers data backfilling.
Data backfilling allows you to retrieve and load missing data or historical data, no matter the destination.
Protecting Schema Integrity
When your data source schema changes, the schema in your destination is NOT automatically altered to prevent unintended "domino" effects.
If a schema change in the source is detected, you will have to manually rebuild the source and then connect it to the relevant flow. This measure ensures that all changes are deliberate, preserving the integrity of your data across all downstream systems.
Other Data Flow Operations
Edit a Data Flow
On the Flows page, click on your data flow to see your flow details.
Generally, you will be able to:
- Change the data flow name
- Configure data quality rules
- Change, add, or delete data sources
If your destination is a data warehouse, you will be able to:
- Change the write mode
- Change sync frequency
- Configure database table details
Make sure the write mode is the same for your flow as well as your data warehouse. Changing the write mode after the flow has been created might have unwanted effects.
Only proceed if you are aware how this will affect your data.
Delete a Data Flow
There are two ways to delete a data flow.
- Delete the data flow directly without deleting the connected data source(s).
- On the Flows page, click on the trash can icon next to your flow.
- Type DELETE to confirm and click on Delete.
- Delete a data source which will delete all connected flows.
- On the Sources page, click on the trash can icon next to your flow.
- A warning with all connected flows listed will pop up for confirmation.
- Type DELETE to confirm and click on Delete.
Troubleshooting
Database Table Was Not Automatically Created
If the flow to the database or data warehouse is in a broken state right after the creation, most commonly the database table wasn't created. Click on the three dots next to the flow and select Show Logs to look for the error description. In most cases the problem is one of these:
- Insufficient permissions: Make sure that the service account you authorized access to destination has writing permissions (aka permissions to create a table).
- The table already exists: Delete the existing table and restart the flow by clicking on Manual data insert.
Flow Is Broken after Changing the Source
In order to maintain data consistency, Dataddo does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge).
If your flow breaks after you changed your source schema, the updated schema most likely does not match the table that was already created in your destination.
- Click on the three dots next to the affected flow and choose Show Logs to look for the error description.
- Delete the existing table in your database and reset the flow. Dataddo will attempt to create a new table.
- If the table cannot be deleted, manually add the missing columns to the existing table.
Experiencing Data Duplicates
For destinations that are primarily append-only, the recommended approach is to use the insert write mode. However, this approach can result in duplicities in your data. To avoid that, consider other writing strategies:
- Truncate insert: This write mode removes all the contents in your table prior to data insertion.
- Upsert: This write mode inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.
For more information on write modes, refer to this article.
Flow With UPSERT Write Mode Is Failing with invalid_configuration Message
The combination of columns that you have chosen does not produce a unique index. Edit the flow and include more columns to the index.