- 6 Minutes to read
Data Flows Overview
- 6 Minutes to read
Database ReplicationIn Dataddo, data flows are the linchpin, orchestrating seamless data integration across a wide spectrum of services, systems or applications. These flows channel your data to various destinations, ensuring you have timely access to your crucial information. The types of flow destinations include:
- BI tools and Dashboarding Apps such as Tableau, Looker Data Studio or PowerBI
- On-Prem or Hosted Databases such as MySQL, Postgres, or SQL Server.
- Cloud Data-Warehouses, including Redshift, Snowflake, Azure SQL Database or Google BigQuery.
- Object Storages or Data Lakes such as AWS S3, Azure Blob Storage, and Google Cloud Storage.
- Business Applications including Salesforce, ExactOnline, Klaviyo, to name a few.
- Any platform specifically designed to receive and manage incoming data.
Data Architecture Flexibility
Dataddo's flexible flows cater to a broad spectrum of data architecture patterns and implementations. Whether you're aiming for straightforward data integration into dashboards and BI tools, batch data ingestion through ETL/ELT processes into data warehouses, moving large datasets into data lakes, replicating databases, or even data activation via reverse ETL, Dataddo has you covered.
Protecting Schema Integrity
In Dataddo, safeguarding your data's consistency and reliability is paramount. To prevent unintended "domino" effects from source schema changes, we do not automatically alter the schema in your destination when the source schema shifts. If a schema change in the source is detected, users must manually rebuild the source and then associate it with the relevant flow. This measure ensures that changes are deliberate, preserving the integrity of your data across all downstream systems.
Data Transformation Capabilities
Dataddo's multi-layered approach to transformations ensures data is consistently analytics-ready from its origin to final destination. Beginning with automated extraction-level transformations, Dataddo tackles key tasks like data flattening and harmonization. Through flow-level transformations, such as Data Union and Data Blending, we further refine and consolidate data. The final stage, destination-level transformations, depends on the capabilities of the chosen platform. This structured method significantly reduces the need for extensive transformations at the data warehouse or other destinations.
Data Quality Measures
At Dataddo, we prioritize data quality and understand its significance in driving accurate analytics and decisions. To this end, we've introduced the Data Quality Firewall, a sophisticated feature engineered to ensure data integrity. This firewall goes beyond basic validation by identifying null values, zero entries, and anomalies within datasets. Users have the flexibility to set specific business rules, enabling them to either halt data writing operations or receive notifications when discrepancies are detected. This proactive approach ensures that only high-quality data is funneled into downstream systems, be it a data warehouse or a BI dashboard, thus safeguarding the reliability of insights derived.
There might be instances where, due to interruptions or changes, you might miss out on certain chunks of data. Dataddo's backfilling capability is designed to address such gaps. Instead of allowing inconsistencies in your datasets, this capability lets you retrieve and seamlessly integrate the missing historical data into your flow, whether it's directed to a data warehouse or a dashboarding application.
Create a Flow
- Navigate to Flows and click on Create Flow.
- Click on Connect Your Data to add your sources.
- Click on Connect Your Data Destination to add the destination.
- Choose the Write mode and fill in the other required information.
- Check the Data Preview to see if your configuration is correct.
- Name your flow, and click on Create Flow to finish the setup.
Edit a Flow
In most cases, you can always change the flow name, the sources which are connected, and scheduling. If the destination is a data warehouse then you can additionally change the write mode and database name.
Keep in mind that, for example, changing data type or field names may result in errors in your dashboarding app of wharehouse.
For example, the flow will break if you change the data type or field names of your data source. You can fix this issue by dropping the table in your data warehouse and force running the flow.
Change Write Mode
When editing your flow, you can change the write mode for your destination. However, changing the write mode can cause data discrepancies.
Only proceed if you are sure how this will affect your data.
Delete a Flow
To delete a flow in Dataddo, there are two main approaches. The preferred method is to directly delete the flow. Head to Flows and click the trash can icon beside the flow you want to remove. Enter DELETE in the provided box to confirm. Keep in mind that using this method won't erase any sources linked to the flow. Alternatively, you can opt to delete the source tied to the flow. Navigate to Sources and select the trash can icon next to the appropriate source. A warning will pop up, alerting you about the flow associated with that source, which will be automatically deleted. To finalize this action, type DELETE into the box.
Database Table Was Not Automatically Created
If the flow to the database or data warehouse is in a broken state right after the creation, most commonly the databases table failed to be created. Click on the three dots next to the flow and select Show Logs to look for the error description. In most cases the problem is one of these:
- Insufficient permissions to create the table. Make sure that the user you have provided when you added the destination has permissions for creating the table.
- The table already exists. Delete the already existing table and restart the flow by clicking on Manual data insert.
Flow Is Broken after Changing the Source
In order to maintain data consistency, Dataddo does not propagate changes done at the flow level to the downstream database destinations (i.e. table schemas are not automatically updated without your knowledge).
If your flow breaks after changing the source, the updated schema most likely does not match the table that was already created. Click on the three dots next to the flow and choose Show Logs to look for the error description. Delete the existing table in your database and reset the flow. Dataddo will attempt to create a new table. If the table cannot be deleted, manually add the missing columns to the existing table.
Experiencing Data Duplicates
For destinations that are primarily append-only, the recommended approach is to use the INSERT write mode. However, this approach can result in duplicities in your data. To avoid that, consider other writing strategies:
- TRUNCATE INSERT. This strategy removes all the contents in the BigQuery table prior to data insertion.
- UPSERT. This strategy inserts new rows and updates existing ones. To perform this correctly, it is necessary to set a unique key representing one or multiple columns.
For more information on write modes, please visit this page.
Flow With UPSERT Write Mode Is Failing with invalid_configuration Message
The combination of columns that you have chosen does not produce a unique index. Edit the flow and include more columns to the index.