- 4 Minutes to read
- DarkLight
Transformation Pipeline
- 4 Minutes to read
- DarkLight
The Dataddo transformation pipeline is a framework for data transformation modeled on the concept of data processing pipelines. API responses from 3rd party services enter a multi-stage pipeline that transforms the response into a dataset ingestible by the Dataddo platform.
Using Universal Connectors in Dataddo, you can get your data from various sources or further adjust the transformation of your existing ones.
Examples
Depending on what kind of transformation of your data you are looking to do, your transformation pipeline will be different. Here are some examples:
Example 1
- The URL of the API endpoint is https://data.nasa.gov/resource/tfkf-kniw.json**.**
- As these data are freely accessible, we do not need any authorization.
- The HTTP Method needs to be set to GET as we want to fetch data.
- The HTTP Headers and Body can be left empty.
In this example, we fetch landslide data from NASA, setting the Transformation Pipeline so that we only extract the metrics relevant for us.
As for the Transformation Pipeline, let's say we want to collect country name, event date, event title, fatality count, injury count, landslide size, location description, and source name. Then the script should look something like this:
[
{
"$unwind": "$data"
},
{
"$project": {
"country_name": {
"$ifNull": [
"$data.country_name",
""
]
},
"event_date": {
"$ifNull": [
"$data.event_date",
""
]
},
"event_title": {
"$ifNull": [
"$data.event_title",
""
]
},
"fatality_count": {
"$ifNull": [
"$data.fatality_count",
0
]
},
"injury_count": {
"$ifNull": [
"$data.injury_count",
0
]
},
"landslide_size": {
"$ifNull": [
"$data.landslide_size",
""
]
},
"location_description": {
"$ifNull": [
"$data.location_description",
""
]
},
"source_name": {
"$ifNull": [
"$data.source_name",
""
]
}
}
}
]
Example 2
Using the same URL as in Example 1, but this time, we will fetch division and country data from NASA, setting the Transformation Pipeline as follows:
[
{
"$unwind": "$data"
},
{
"$project": {
"admin_division_name": {
"$ifNull": [
"$data.admin_division_name",
""
]
},
"admin_division_population": {
"$ifNull": [
"$data.admin_division_population",
""
]
},
"country_code": {
"$ifNull": [
"$data.country_code",
""
]
},
"country_name": {
"$ifNull": [
"$data.country_name",
""
]
},
"created_date": {
"$ifNull": [
"$data.created_date",
""
]
}
}
}
]
Example 3
This transformation aims to get data on the first name, absence, cost centers from Personio using the official URL which looks like the following:
https://api.personio.de/v1/company/employees?limit=200&offset=0
This time, we had to authorize the account to be able to pull the data.
[
{
"$unwind": "$data"
},
{
"$unwind": {
"path": "$data.attributes.absence_entitlement.value",
"preserveNullAndEmptyArrays": true
}
},
{
"$unwind": {
"path": "$data.attributes.cost_centers.value",
"preserveNullAndEmptyArrays": true
}
},
{
"$unset": "_id"
},
{
"$project": {
"email": {
"$ifNull": [
"$data.attributes.email.value",
""
]
},
"first_name": {
"$ifNull": [
"$data.attributes.first_name.value",
""
]
},
"absence_entitlemen_category": {
"$ifNull": [
"$data.attributes.absence_entitlement.value.attributes.category",
""
]
},
"absence_entitlement_entitlement": {
"$ifNull": [
"$data.attributes.absence_entitlement.value.attributes.entitlement",
0
]
},
"cost_centers_id": {
"$ifNull": [
"$data.attributes.cost_centers.value.attributes.id",
0
]
},
"cost_centers_name": {
"$ifNull": [
"$data.attributes.cost_centers.value.attributes.name",
""
]
}
}
}
]
Stages
The Dataddo transformation pipeline consists of multiple stages. Each stage transforms the API response as it passes through the pipeline. Pipeline stages can appear multiple times in the pipeline.
Stages | Description |
---|---|
$project | Reshapes API response in the stream. e.g. adding new fields or removing existing fields. |
$unwind | Deconstructs an array field from the API response to object for each element. Each object replaces the array with an element value. For each input, outputs N objects where N is the number of array elements and can be zero for an empty array. |
$match | Filters the stream to allow only matching objects to pass unmodified into the next pipeline stage. For each input object, outputs either one object (a match) or removes the object from a stream (no match). |
$group | Groups input objects by a specified identifier expression and applies the accumulator expression(s), if specified, to each group. Consumes all input objects and outputs one object per each distinct group. The output objects only contain the identifier field and, if specified, accumulated fields. |
Expressions
Some pipeline stages take a pipeline expression as the operand. Pipeline expressions specify the transformation to apply to the input objects. Expressions have a standard JSON object structure and can contain other expressions.
Pipeline expressions can only operate on the current object in the pipeline and cannot refer to data from other objects: expression operations provide in-memory transformation of objects.
Expression operators
Arithmetic
Arithmetic expressions perform mathematic operations on numbers.
Operator | Description |
---|---|
$abs | Returns the absolute value of a number. |
$add | Adds numbers to return the sum. |
$subtract | Returns the result of subtracting the second value from the first. |
$ceil | Returns the smallest integer greater than or equal to the specified number. |
$floor | Returns the largest integer less than or equal to the specified number. |
Boolean
Boolean expressions evaluate their argument expressions as booleans and return a boolean as a result.
Operator | Description |
---|---|
$and | Returns true only when all its expressions evaluate to be true. Accepts any number of argument expressions. |
$not | Returns the boolean value that is the opposite of its argument expression. Accepts a single argument expression. |
$or | Returns true when any of its expressions evaluates to true. Accepts any number of argument expressions. |
Comparison
Operator | Description |
---|---|
$cmp | Returns 0 if the two values are equivalent, 1 if the first value is greater than the second, and -1 if the first value is less than the second. |
$eq | Returns true if the values are equivalent. |
$gt | Returns true if the first value is greater than the second. |
$gte | Returns true if the first value is greater than or equal to the second. |
$lt | Returns true if the first value is less than the second. |
$lte | Returns true if the first value is less than or equal to the second. |
$ne | Returns true if the values are not equivalent. |
Feel free to contact us. We are happy to help you with the data transformations pipeline setup.