Batch Extraction
  • 6 Minutes to read
  • Dark
    Light

Batch Extraction

  • Dark
    Light

Article Summary

Batch extraction is a prevalent method utilized by most connectors to prominent business systems and applications such as Salesforce, Google Analytics 4, Zendesk, or Mailchimp. This method involves retrieving data at preset intervals, necessitating the establishment of a specific extraction frequency. This strategy proves especially advantageous when handling large data volumes or when the data source does not mandate real-time updates. It allows for a balance between data freshness and system resource optimization. Initially configured during the connector setup phase, the extraction frequency can be finely tuned later to align with evolving business requirements.

Core Concepts - Movement - Batch Extraction

Extraction Frequency

The extraction frequency refers to the regular intervals at which Dataddo retrieves data from your linked sources. This setting is primarily defined during the connector configuration, where Dataddo will suggest a recommended extraction frequency to ensure optimal data retrieval based on the specific characteristics of the source. However, it can be modified later to accommodate changing requirements and preferences.

Adjusting Extraction Frequency

Initially set up during the connector configuration, it can be accessed and altered later from the Sources.

  1. Select the source you intend to modify and click on the Extraction Frequency settings.
  2. Here, you can opt for a predefined frequency (such as hourly or daily) or establish a custom cron expression for more detailed control.
  3. Save the modifications to apply the new extraction frequency.

Maximum Extraction Frequency

Dataddo supports a minimum interval of up to 1 minute for data extraction. However, utilizing such a high frequency might be impractical or even impossible for some services, including platforms like Google Analytics 4 or Facebook Ads, which do not offer that level of data freshness. Furthermore, during the connector configuration process, Dataddo always recommends an optimal extraction frequency to ensure both data accuracy and adherence to service limitations.

Error Handling and Retrying

Dataddo places utmost importance on the reliability and robustness of its extraction processes. Recognizing that data extraction can sometimes face challenges, our system is designed with proactive error handling mechanisms.

  1. Automatic Retries. In the event of an error during extraction, Dataddo doesn't give up immediately. Instead, it will automatically retry the extraction process three times, ensuring transient issues do not hinder data extraction.
  2. Error Clasification. Errors during extraction can manifest in several ways. This includes scenarios where:
    1. The extraction process yields 0 rows, indicating no new data availability. If yielding 0 rows is a positive outcome, please set Allow Empty option.
    2. Authorization to the source fails, usually pointing to permissions or authentication issues. To fix this, try re-authorization.
    3. The schema of the received data is inconsistent with the source's defined schema. Consider adjusting the schema to fix this error.
    4. Any other anomalies or issues originating from the source system.
  3. Permanent Broken State. If, after three retries, the extraction is still unsuccessful, Dataddo will designate the source as being in a permanent broken state. This status serves as a clear indicator that manual intervention might be necessary to resolve the underlying issue.

Batch Size Limitations

During batch extraction, especially with API-based connectors like Salesforce, Hubspot, Google Analytics 4, and Netsuite, navigating the limitations of third-party APIs is often more challenging than handling the data volume itself. These limitations could encompass timeframe restrictions, API call caps, and data request size limits per call.

Dataddo proactively counters these challenges with innovative strategies that maximize data extraction within the optimal timeframe while adhering to API guidelines. These strategies, including parallel API calls, automatic pagination, and dynamic adjustment of request limits, facilitate a smooth, efficient, and comprehensive data retrieval process.

Multi-Account Extraction (MAE)

Multi-Account Extraction (MAE) is a feature that enables the simultaneous data extraction from multiple accounts or properties, such as several Facebook or Google Analytics accounts, in a single source setup. This functionality is built for administrators managing multiple accounts, ensuring uniformity in data extraction settings across all accounts.

Enabling MAE

To activate the MAE feature, get in touch with us and our technical team will assist you in setting it up. Note that administrator-level permissions are required for all the accounts you wish to include in the MAE setup. The attributes and metrics selected during source configuration will be applied uniformly across all chosen accounts. For data differentiation, select distinguishing attributes like account name/ID or page name/ID.

Connectors Supporting MAE

FAQ

Can I select only some accounts for multi-account extraction?

Yes, you can choose all of your accounts or only a selected few. In your request to enable MAE, please let us know which accounts you wish to extract data from.

How many accounts can I select for multi-account extraction? Are there any extra charges?

MAE is included in the plans and you will not be charged extra for it.

PlanNumber of Accounts
FreeN/A
Data to DashboardsUp to 10 accounts
Data AnywhereUp to 30 accounts
HeadlessUnlimited

However, depending on the amount of data your accounts have, more than one MAE might be necessary.

Troubleshooting

Missing rows

The missing rows in your data extraction could likely be the result of row limitations imposed by the third-party API. While Dataddo implements strategies to smartly bypass various API limitations, these efforts have boundaries.

In case you suspect data is missing, start by closely examining the extraction logs within Dataddo. A tell-tale sign of missing data is the presence of suspiciously rounded numbers, such as 10,000 rows. This indicates that you may be encountering a data limitation.

To address this, consider adapting your extraction strategy to yield smaller, more digestible data batches. This adjustment generally involves reducing the timeframe of extraction, thereby ensuring a more comprehensive and accurate data retrieval process.

All requests to target systems failed Error

All requests to target systems failed

When attempting to extract your data, you find that all requests have failed. To uncover the underlying issue, you can perform a test extraction using the following steps:

  1. Locate and click on the three dots adjacent to your source.
  2. From the dropdown menu, select the Debug option.
  3. If the extraction continues to fail, take note of the error message displayed. Use this message to search for a potential solution within this documentation. If the debug process completes successfully without any issues, restart the source to initiate the regular extraction process.

No data retrieved Error

No data retrieved. Enable extraction's 'allow empty' rule if you accept empty datasets

During the extraction process, no data was retrieved. By default, extractions that result in 0 rows will be flagged as an error. However, if expecting 0 rows is a normal occurrence for your operations (for instance, if you don't receive orders or run ads every day), you can enable the Allow Empty function to avoid this error notification.

  1. Click on the three dots next to your Source.
  2. Select Edit and scroll down within the Basic Info tab.
  3. Check Allow Empty option.

Failed to update source data Error

Failed to save data to storage: Failed to update source data: Server error: PUT https://storage.prod.dataddo.com/v1.0/replace resulted in a 500 Internal Server Error response:{"status":"Internal Server Error","error":"replace: Failed to cast values of column '...' (type:integer, id:...): failed to cast string value '...' to ..."}

This indicates that the data type specified in the Schema is incorrect, and the newly extracted data does not pass the data type validation. This can occur either when you have changed the data type in the schema to an incorrect value or if the source has begun to send data in a different data type. Inspect the error message to determine which column is causing the problem, allowing you to adjust the data type accordingly.

  1. Click on the three dots next to your Source.
  2. Select Schema tab and change the appropriate column to the correct data type.

Was this article helpful?