Transformation Pipeline Practical Examples
  • 4 Minutes to read
  • Dark
    Light

Transformation Pipeline Practical Examples

  • Dark
    Light

Article summary

On this page, you will find practical examples of transformation pipeline scripts.

Example 1: NASA Landslide Data Extraction

  1. API Endpoint URL: https://data.nasa.gov/resource/tfkf-kniw.json
  2. Authorization: Not required.
  3. HTTP Method: GET
  4. HTTP Headers and Body: Not required.

Task: Selectively extract metrics from NASA's landslide data. Data to be extracted is: country name, event date, event title, fatality count, injury count, landslide size, location description, and source name.

Transformation script:

[
    {
        "$unwind": "$data"
    },
    {
        "$project": {
            "country_name": {
                "$ifNull": [
                    "$data.country_name",
                    ""
                ]
            },
            "event_date": {
                "$ifNull": [
                    "$data.event_date",
                    ""
                ]
            },
            "event_title": {
                "$ifNull": [
                    "$data.event_title",
                    ""
                ]
            },
            "fatality_count": {
                "$ifNull": [
                    "$data.fatality_count",
                    0
                ]
            },
            "injury_count": {
                "$ifNull": [
                    "$data.injury_count",
                    0
                ]
            },
            "landslide_size": {
                "$ifNull": [
                    "$data.landslide_size",
                    ""
                ]
            },
            "location_description": {
                "$ifNull": [
                    "$data.location_description",
                    ""
                ]
            },
            "source_name": {
                "$ifNull": [
                    "$data.source_name",
                    ""
                ]
            }
        }
    }
]

Example 2: NASA Division and Country Data Extraction

  1. API Endpoint URL: https://data.nasa.gov/resource/tfkf-kniw.json
  2. Authorization: Not required.
  3. HTTP Method: GET
  4. HTTP Headers and Body: Not required.

Task: Selectively extract metrics from NASA's landslide data. Data to be extracted is: division and country data from NASA.

Transformation script:

[
    {
        "$unwind": "$data"
    },
    {
        "$project": {
            "admin_division_name": {
                "$ifNull": [
                    "$data.admin_division_name",
                    ""
                ]
            },
            "admin_division_population": {
                "$ifNull": [
                    "$data.admin_division_population",
                    ""
                ]
            },
            "country_code": {
                "$ifNull": [
                    "$data.country_code",
                    ""
                ]
            },
            "country_name": {
                "$ifNull": [
                    "$data.country_name",
                    ""
                ]
            },
            "created_date": {
                "$ifNull": [
                    "$data.created_date",
                    ""
                ]
            }
        }
    }
]

Example 3: Personio Data Extraction

  1. API Endpoint URL: https://api.personio.de/v1/company/employees?limit=200&offset=0 (generate your URL using Personio's official documentation)
  2. Authorization: Required, please authenticate your Personio account.
  3. HTTP Method: GET

Task: Extract the following data from Personio: first name, absence, cost centers .

Transformation script:

[
    {
        "$unwind": "$data"
    },
    {
        "$unwind": {
            "path": "$data.attributes.absence_entitlement.value",
            "preserveNullAndEmptyArrays": true
        }
    },
    {
        "$unwind": {
            "path": "$data.attributes.cost_centers.value",
            "preserveNullAndEmptyArrays": true
        }
    },
    {
        "$unset": "_id"
    },
    {
        "$project": {
            "email": {
                "$ifNull": [
                    "$data.attributes.email.value",
                    ""
                ]
            },
            "first_name": {
                "$ifNull": [
                    "$data.attributes.first_name.value",
                    ""
                ]
            },
            "absence_entitlemen_category": {
                "$ifNull": [
                    "$data.attributes.absence_entitlement.value.attributes.category",
                    ""
                ]
            },
            "absence_entitlement_entitlement": {
                "$ifNull": [
                    "$data.attributes.absence_entitlement.value.attributes.entitlement",
                    0
                ]
            },
            "cost_centers_id": {
                "$ifNull": [
                    "$data.attributes.cost_centers.value.attributes.id",
                    0
                ]
            },
            "cost_centers_name": {
                "$ifNull": [
                    "$data.attributes.cost_centers.value.attributes.name",
                    ""
                ]
            }
        }
    }
]

Example 4: Weather Data Extraction from OpenWeatherMap

  1. API Endpoint URL: http://api.openweathermap.org/data/2.5/weather?q=London&appid=YOUR_API_KEY
  2. Authorization: Use your OpenWeatherMap API key in the API URL.
  3. HTTP Method: GET

Task: Extract temperature, pressure, humidity, and weather description from OpenWeatherMap data for the city of London.

Transformation script:

[
    {
        "$unwind": {
            "path": "$weather",
            "preserveNullAndEmptyArrays": true
        }
    },
    {
        "$project": {
            "temperature": {
                "$ifNull": [
                    "$main.temp",
                    ""
                ]
            },
            "pressure": {
                "$ifNull": [
                    "$main.pressure",
                    0
                ]
            },
            "humidity": {
                "$ifNull": [
                    "$main.humidity",
                    0
                ]
            },
            "weather_description": {
                "$ifNull": [
                    "$weather.description",
                    ""
                ]
            }
        }
    }
]

Adding the "preserveNullAndEmptyArrays": true option to the $unwind stage will ensure there won't be an error when there's only one item in the weather array.

Example 5: GitHub Repository Data Extraction

  1. API Endpoint URL: https://api.github.com/repos/octocat/Hello-World
  2. Authorization: Use your GitHub personal access token in the HTTP header section.
  3. HTTP Method: GET
  4. HTTP Header:
    1. Key: Authorization
    2. Value: token YOUR_PERSONAL_ACCESS_TOKEN

Task: Extract repository name, owner, stars count, and forks count from GitHub for a given repository.

DATADDO TIP

Authorization for GitHub is often not necessary for public repositories. However, it is still recommended for consistency, gain access to certain data which might require authentication, and to avoid higher rate limit.

Transformation script:

[
    {
        "$project": {
            "repo_name": {
                "$ifNull": [
                    "$name",
                    ""
                ]
            },
            "owner": {
                "$ifNull": [
                    "$owner.login",
                    ""
                ]
            },
            "stars_count": {
                "$ifNull": [
                    "$stargazers_count",
                    0
                ]
            },
            "forks_count": {
                "$ifNull": [
                    "$forks_count",
                    0
                ]
            }
        }
    }
]

Example 6: Spotify Top Tracks Data Extraction

  1. API Endpoint URL: https://api.spotify.com/v1/artists/{artist_id}/top-tracks?country=US
  2. Authorization: Required. You'd need a Spotify access token.
  3. HTTP Method: GET
  4. HTTP Header:
    1. Key: Authorization
    2. Value: Bearer YOUR_SPOTIFY_ACCESS_TOKEN

Task: Retrieve the top 10 tracks of a specific artist including the album name, release date, total number of tracks on the album, and popularity score.

Transformation script:

[
    {
        "$unwind": "$tracks"
    },
    {
        "$project": {
            "track_name": {
                "$ifNull": ["$tracks.name", ""]
            },
            "album_name": {
                "$ifNull": ["$tracks.album.name", ""]
            },
            "album_release_date": {
                "$ifNull": ["$tracks.album.release_date", ""]
            },
            "album_total_tracks": {
                "$ifNull": ["$tracks.album.total_tracks", 0]
            },
            "popularity_score": {
                "$ifNull": ["$tracks.popularity", 0]
            }
        }
    },
    {
        "$limit": 10
    }
]

Was this article helpful?