Data Backfilling via Dataddo API
- 2 Minutes to read
- DarkLight
Data Backfilling via Dataddo API
- 2 Minutes to read
- DarkLight
Article summary
Did you find this summary helpful?
Thank you for your feedback
Data backfilling refers to the process of filling gaps in data by loading data outside of scheduled times, which can be achieved through several manual data loads to accommodate different time periods and ensure continuity in data availability.
Prerequisites
To manually load data via Dataddo API, you will need the following:
- Access token: Obtain by authorizing the connection. The token will be used in consequent requests.
- Source ID: Obtain by call the GET /sources API endpoint. The source ID will be in the field called id.
- Flow ID: Obtain by calling the GET /flows/by-source/{sourceId} and using the source ID extracted above. The flow ID will be in the field called id.
Extract Data and Write Loop
To automate the data backfilling process, you will need to enqueue the source extraction, wait until the extraction is finalised, enqueue the flow write action, wait until the write is finalised, repeat.
- Enque the extraction:
- Call the POST /sources/{id}/extraction/enqueue endpoint.
- Specify the date range you want to load in the payload of the body. For example
{"dateRangeExpression": "7d7"}
will load data from 7 days ago. For more information, refer to the article on dynamic date range.
- Wait until the extraction finishes running:
- You can check the status of the extraction by calling GET /sources/{id}/status.
- The status should no longer be
running
butlive
. That means that the extraction was successfully finalised and the source is ready to extract more data.
- Enqueue the write action: Call POST /flows/{id}/write/enqueue. You don't need to include anything in the body for the data to be written to your destination.
- Wait until the flow finishes writing data: Call GET /flows/{id}/status to check the status of the write action.
- Repeat until all data is loaded: Don't forget to keep adjusting the
dateRangeExpression
from step 1 until you load all the data.
Best Practices
- Always check the status endpoint first to avoid trying to run a source/flow that is already being processed. If you try to enqueue a source/flow that is already running, you will receive an error.
- There is a limitation of 100 loads per dat for each source and each flow.
- Keep in mind rate limits of specific APIs from which you are extracting the data.
- Use upsert write mode in your flow to prevent any duplicates.
Was this article helpful?