Data Flows Overview
  • 5 Minutes to read
  • Dark
    Light

Data Flows Overview

  • Dark
    Light

Article summary

A data flow is the connection between a data source or sources and a data destination.

Key considerations for data flows include the following:

One Dataset, One Flow

  • For fixed-schema connectors like HubSpot Analytics, each dataset must have its own flow. This means that if you need to extract multiple datasets, such as contacts and deals, you’ll need a separate flow for each.

Multi-Account Extraction

  • If multiple accounts share the same schema (i.e., identical attributes and metrics), their data can be combined into a single flow using data union. For example, data from multiple Google Analytics accounts can be consolidated and extracted together.

Creating a Data Flow

Dataddo de-couples the data extraction and data ingestion processes allowing more architectural flexibility. When you create a data flow, you create M:N relationships between sources and destinations.

Flexibility provided by data flows facilitates a broad spectrum of data architecture patterns and implementations, such as:

  1. Straightforward data integration into dashboards and BI tools
  2. Batch data ingestion through ETL/ELT processes into data warehouses
  3. Moving large datasets into data lakes
  4. Replicating databases
  5. Data activation via reverse ETL

Types of Data Flow

Batch Data Flow

Standard and default type of the flow.

Streaming Data Flow

When a Flow is configured to use streaming, the transfer of larger amounts of data is usually considerably faster. Additionally, with the Streaming Flow, you have the absolute guarantee that no data will be cached during the process. However, this also means that features requiring caching, such as flow-level data transformations, cannot be used.

Data Flow Features

To further enhance flexibility and facilitate data management, Dataddo offers multiple features that can be configured on the flow level. These features can be divided in the following categories:

  1. Data transformation capabilities
  2. Data quality measures
  3. Data backfilling

Data Transformation Capabilities

Dataddo's multi-layered approach to transformations ensures data is consistently analytics-ready from its origin to final destination.

Dataddo offers data transformation on the extraction-level (which are appplied automatically) and flow-level. This structured method significantly reduces the need for extensive transformations at the data warehouse or other destinations.

Data Quality Measures

To make sure data quality is prioritized, Dataddo allows the configuration of specific business rules to either halt data writing operations or send notifications when discrepancies are detected.

This proactive approach ensures that only high-quality data is funneled into downstream systems, be it a data warehouse or a BI dashboard, thus safeguarding the reliability of insights derived.

Data Backfilling

In cases of interruptions or changes, some data might be missing. To address such gaps, Dataddo offers data backfilling.

Data backfilling allows you to retrieve and load missing data or historical data, no matter the destination.

Protecting Schema Integrity

When your data source schema changes, the schema in your destination is NOT automatically altered to prevent unintended "domino" effects.

If a schema change in the source is detected, you will have to manually rebuild the source and then connect it to the relevant flow. This measure ensures that all changes are deliberate, preserving the integrity of your data across all downstream systems.

Other Data Flow Operations

Edit a Data Flow

On the Flows page, click on your data flow to see your flow details.

Generally, you will be able to:

If your destination is a data warehouse, you will be able to:

  • Change the write mode
  • Change sync frequency
  • Configure database table details
Changing the write mode can cause data discrepancies!

Make sure the write mode is the same for your flow as well as your data warehouse. Changing the write mode after the flow has been created might have unwanted effects.

Only proceed if you are aware how this will affect your data.

Delete a Data Flow

There are two ways to delete a data flow.

  1. Delete the data flow directly without deleting the connected data source(s).
    1. On the Flows page, click on the trash can icon next to your flow.
    2. Type DELETE to confirm and click on Delete.
  2. Delete a data source which will delete all connected flows.
    1. On the Sources page, click on the trash can icon next to your flow.
    2. A warning with all connected flows listed will pop up for confirmation.
    3. Type DELETE to confirm and click on Delete.

Troubleshooting

Database Table Was Not Automatically Created

If the flow to the database or data warehouse is in a broken state right after the creation, most commonly the database table wasn't created. Click on the three dots next to the flow and select Show Logs to look for the error description. In most cases the problem is one of these:

  • Insufficient permissions: Make sure that the service account you authorized access to destination has writing permissions (aka permissions to create a table).
  • The table already exists: Delete the existing table and restart the flow by clicking on Manual data insert.

Flow Is Broken after Changing the Source

In order to maintain data consistency, Dataddo does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge).

If your flow breaks after you changed your source schema, the updated schema most likely does not match the table that was already created in your destination.

  1. Click on the three dots next to the affected flow and choose Show Logs to look for the error description.
  2. Delete the existing table in your database and reset the flow. Dataddo will attempt to create a new table.
  3. If the table cannot be deleted, manually add the missing columns to the existing table.

Experiencing Data Duplicates

For destinations that are primarily append-only, the recommended approach is to use the insert write mode. However, this approach can result in duplicities in your data. To avoid that, consider other writing strategies:

  • Truncate insert: This write mode removes all the contents in your table prior to data insertion.
  • Upsert: This write mode inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.

For more information on write modes, refer to this article.

Flow With UPSERT Write Mode Is Failing with invalid_configuration Message

The combination of columns that you have chosen does not produce a unique index. Edit the flow and include more columns to the index.


Was this article helpful?