Data Flows Overview
  • 6 Minutes to read
  • Dark
    Light

Data Flows Overview

  • Dark
    Light

Article Summary

In Dataddo, data flows are the linchpin, orchestrating seamless data integration across a wide spectrum of services, systems or applications. These flows channel your data to various destinations, ensuring you have timely access to your crucial information. The types of flow destinations include:

Data Architecture Flexibility

Dataddo's flexible flows cater to a broad spectrum of data architecture patterns and implementations. Whether you're aiming for straightforward data integration into dashboards and BI tools, batch data ingestion through ETL/ELT processes into data warehouses, moving large datasets into data lakes, replicating databases, or even data activation via reverse ETL, Dataddo has you covered.

Protecting Schema Integrity

In Dataddo, safeguarding your data's consistency and reliability is paramount. To prevent unintended "domino" effects from source schema changes, we do not automatically alter the schema in your destination when the source schema shifts. If a schema change in the source is detected, users must manually rebuild the source and then associate it with the relevant flow. This measure ensures that changes are deliberate, preserving the integrity of your data across all downstream systems.

Data Transformation Capabilities

Dataddo's multi-layered approach to transformations ensures data is consistently analytics-ready from its origin to final destination. Beginning with automated extraction-level transformations, Dataddo tackles key tasks like data flattening and harmonization. Through flow-level transformations, such as Data Union and Data Blending, we further refine and consolidate data. The final stage, destination-level transformations, depends on the capabilities of the chosen platform. This structured method significantly reduces the need for extensive transformations at the data warehouse or other destinations.

Data Quality Measures

At Dataddo, we prioritize data quality and understand its significance in driving accurate analytics and decisions. To this end, we've introduced the Data Quality Firewall, a sophisticated feature engineered to ensure data integrity. This firewall goes beyond basic validation by identifying null values, zero entries, and anomalies within datasets. Users have the flexibility to set specific business rules, enabling them to either halt data writing operations or receive notifications when discrepancies are detected. This proactive approach ensures that only high-quality data is funneled into downstream systems, be it a data warehouse or a BI dashboard, thus safeguarding the reliability of insights derived.

Data Backfilling

There might be instances where, due to interruptions or changes, you might miss out on certain chunks of data. Dataddo's backfilling capability is designed to address such gaps. Instead of allowing inconsistencies in your datasets, this capability lets you retrieve and seamlessly integrate the missing historical data into your flow, whether it's directed to a data warehouse or a dashboarding application.

Create a Flow

  1. Navigate to Flows and click on Create Flow.
  2. Click on Connect Your Data to add your sources.
  3. Click on Connect Your Data Destination to add the destination.
  4. Choose the write mode and fill in the other required information.
  5. Check the Data Preview to see if your configuration is correct.
  6. Name your flow, and click on Create Flow to finish the setup.

Edit a Flow

In most cases, you can always change the flow name, the sources which are connected, and scheduling. If the destination is a data warehouse then you can additionally change the write mode and database name.

Navigate to Flows, click on the three dots next to your flow and select Edit. There, you can change the name of the flow, the source, scheduling, and for data warehouses, you can edit the write mode.

Keep in mind that, for example, changing data type or field names may result in errors in your dashboarding app of wharehouse.

Please keep in mind that any changes made to the structure of the table will break the flow.

For example, the flow will break if you change the data type or field names of your data source. You can fix this issue by dropping the table in your data warehouse and force running the flow.

Change Write Mode

When editing your flow, you can change the write mode for your destination. However, changing the write mode can cause data discrepancies.

Only proceed if you are sure how this will affect your data.

Delete a Flow

To delete a flow in Dataddo, there are two main approaches. The preferred method is to directly delete the flow. Head to Flows and click the trash can icon beside the flow you want to remove. Enter DELETE in the provided box to confirm. Keep in mind that using this method won't erase any sources linked to the flow. Alternatively, you can opt to delete the source tied to the flow. Navigate to Sources and select the trash can icon next to the appropriate source. A warning will pop up, alerting you about the flow associated with that source, which will be automatically deleted. To finalize this action, type DELETE into the box.


Troubleshooting

Database Table Was Not Automatically Created

If the flow to the database or data warehouse is in a broken state right after the creation, most commonly the databases table failed to be created. Click on the three dots next to the flow and select Show Logs to look for the error description. In most cases the problem is one of these:

  • Insufficient permissions to create the table. Make sure that the user you have provided when you added the destination has permissions for creating the table.
  • The table already exists. Delete the already existing table and restart the flow by clicking on Manual data insert.

Flow Is Broken after Changing the Source

In order to maintain data consistency, Dataddo does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge).

If your flow breaks after changing the source, the updated schema most likely does not match the table that was already created. Click on the three dots next to the flow and choose Show Logs to look for the error description. Delete the existing table in your database and reset the flow. Dataddo will attempt to create a new table. If the table cannot be deleted, manually add the missing columns to the existing table.

Experiencing Data Duplicates

For destinations that are primarily append-only, the recommended approach is to use the INSERT write mode. However, this approach can result in duplicities in your data. To avoid that, consider other writing strategies:

  • TRUNCATE INSERT. This strategy removes all the contents in the BigQuery table prior to data insertion.
  • UPSERT. This strategy inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.

For more information on write modes, please visit this page.

Flow With UPSERT Write Mode Is Failing with invalid_configuration Message

The combination of columns that you have chosen does not produce a unique index. Edit the flow and include more columns to the index.


Was this article helpful?