- 4 Minutes to read
- DarkLight
Transformation Pipeline Practical Examples
- 4 Minutes to read
- DarkLight
On this page, you will find practical examples of transformation pipeline scripts.
- Example 1: NASA Landslide Data Extraction: A process for extracting and transforming data on landslide occurrences from NASA's databases.
- Example 2: NASA Division and Country Data Extraction: A guide on segmenting NASA data by geographical divisions and countries.
- Example 3: Personio Data Extraction: A step-by-step walkthrough for retrieving and formatting employee data from Personio.
- Example 4: Weather Data Extraction from OpenWeatherMap: Steps to access and process weather information from OpenWeatherMap.
- Example 5: GitHub Repository Data Extraction: Script for extracting and organizing data from GitHub repositories.
- Example 6: Spotify Top Tracks Data Extraction: Approaches for extracting and structuring data on Spotify's top tracks.
Example 1: NASA Landslide Data Extraction
- API Endpoint URL:
https://data.nasa.gov/resource/tfkf-kniw.json
- Authorization: Not required.
- HTTP Method:
GET
- HTTP Headers and Body: Not required.
Task: Selectively extract metrics from NASA's landslide data. Data to be extracted is: country name, event date, event title, fatality count, injury count, landslide size, location description, and source name.
Transformation script:
[
{
"$unwind": "$data"
},
{
"$project": {
"country_name": {
"$ifNull": [
"$data.country_name",
""
]
},
"event_date": {
"$ifNull": [
"$data.event_date",
""
]
},
"event_title": {
"$ifNull": [
"$data.event_title",
""
]
},
"fatality_count": {
"$ifNull": [
"$data.fatality_count",
0
]
},
"injury_count": {
"$ifNull": [
"$data.injury_count",
0
]
},
"landslide_size": {
"$ifNull": [
"$data.landslide_size",
""
]
},
"location_description": {
"$ifNull": [
"$data.location_description",
""
]
},
"source_name": {
"$ifNull": [
"$data.source_name",
""
]
}
}
}
]
Example 2: NASA Division and Country Data Extraction
- API Endpoint URL:
https://data.nasa.gov/resource/tfkf-kniw.json
- Authorization: Not required.
- HTTP Method:
GET
- HTTP Headers and Body: Not required.
Task: Selectively extract metrics from NASA's landslide data. Data to be extracted is: division and country data from NASA.
Transformation script:
[
{
"$unwind": "$data"
},
{
"$project": {
"admin_division_name": {
"$ifNull": [
"$data.admin_division_name",
""
]
},
"admin_division_population": {
"$ifNull": [
"$data.admin_division_population",
""
]
},
"country_code": {
"$ifNull": [
"$data.country_code",
""
]
},
"country_name": {
"$ifNull": [
"$data.country_name",
""
]
},
"created_date": {
"$ifNull": [
"$data.created_date",
""
]
}
}
}
]
Example 3: Personio Data Extraction
- API Endpoint URL:
https://api.personio.de/v1/company/employees?limit=200&offset=0
(generate your URL using Personio's official documentation) - Authorization: Required, please authenticate your Personio account.
- HTTP Method:
GET
Task: Extract the following data from Personio: first name, absence, cost centers .
Transformation script:
[
{
"$unwind": "$data"
},
{
"$unwind": {
"path": "$data.attributes.absence_entitlement.value",
"preserveNullAndEmptyArrays": true
}
},
{
"$unwind": {
"path": "$data.attributes.cost_centers.value",
"preserveNullAndEmptyArrays": true
}
},
{
"$unset": "_id"
},
{
"$project": {
"email": {
"$ifNull": [
"$data.attributes.email.value",
""
]
},
"first_name": {
"$ifNull": [
"$data.attributes.first_name.value",
""
]
},
"absence_entitlemen_category": {
"$ifNull": [
"$data.attributes.absence_entitlement.value.attributes.category",
""
]
},
"absence_entitlement_entitlement": {
"$ifNull": [
"$data.attributes.absence_entitlement.value.attributes.entitlement",
0
]
},
"cost_centers_id": {
"$ifNull": [
"$data.attributes.cost_centers.value.attributes.id",
0
]
},
"cost_centers_name": {
"$ifNull": [
"$data.attributes.cost_centers.value.attributes.name",
""
]
}
}
}
]
Example 4: Weather Data Extraction from OpenWeatherMap
- API Endpoint URL:
http://api.openweathermap.org/data/2.5/weather?q=London&appid=YOUR_API_KEY
- Authorization: Use your OpenWeatherMap API key in the API URL.
- HTTP Method:
GET
Task: Extract temperature, pressure, humidity, and weather description from OpenWeatherMap data for the city of London.
Transformation script:
[
{
"$unwind": {
"path": "$weather",
"preserveNullAndEmptyArrays": true
}
},
{
"$project": {
"temperature": {
"$ifNull": [
"$main.temp",
""
]
},
"pressure": {
"$ifNull": [
"$main.pressure",
0
]
},
"humidity": {
"$ifNull": [
"$main.humidity",
0
]
},
"weather_description": {
"$ifNull": [
"$weather.description",
""
]
}
}
}
]
Adding the "preserveNullAndEmptyArrays": true
option to the $unwind
stage will ensure there won't be an error when there's only one item in the weather array.
Example 5: GitHub Repository Data Extraction
- API Endpoint URL:
https://api.github.com/repos/octocat/Hello-World
- Authorization: Use your GitHub personal access token in the HTTP header section.
- HTTP Method:
GET
- HTTP Header:
- Key:
Authorization
- Value:
token YOUR_PERSONAL_ACCESS_TOKEN
- Key:
Task: Extract repository name, owner, stars count, and forks count from GitHub for a given repository.
Authorization for GitHub is often not necessary for public repositories. However, it is still recommended for consistency, gain access to certain data which might require authentication, and to avoid higher rate limit.
Transformation script:
[
{
"$project": {
"repo_name": {
"$ifNull": [
"$name",
""
]
},
"owner": {
"$ifNull": [
"$owner.login",
""
]
},
"stars_count": {
"$ifNull": [
"$stargazers_count",
0
]
},
"forks_count": {
"$ifNull": [
"$forks_count",
0
]
}
}
}
]
Example 6: Spotify Top Tracks Data Extraction
- API Endpoint URL:
https://api.spotify.com/v1/artists/{artist_id}/top-tracks?country=US
- Authorization: Required. You'd need a Spotify access token.
- HTTP Method:
GET
- HTTP Header:
- Key:
Authorization
- Value:
Bearer YOUR_SPOTIFY_ACCESS_TOKEN
- Key:
Task: Retrieve the top 10 tracks of a specific artist including the album name, release date, total number of tracks on the album, and popularity score.
Transformation script:
[
{
"$unwind": "$tracks"
},
{
"$project": {
"track_name": {
"$ifNull": ["$tracks.name", ""]
},
"album_name": {
"$ifNull": ["$tracks.album.name", ""]
},
"album_release_date": {
"$ifNull": ["$tracks.album.release_date", ""]
},
"album_total_tracks": {
"$ifNull": ["$tracks.album.total_tracks", 0]
},
"popularity_score": {
"$ifNull": ["$tracks.popularity", 0]
}
}
},
{
"$limit": 10
}
]