JSON Schema Transformer
The JSON Transform feature enables modification and enrichment of incoming content within the pipeline. This component allows the application of mappings, operations, and conditions to customize and transform the pipeline data.
Mappings
Mappings can be used to rename JSON properties, remove unnecessary content, or add new data. Expression conditions control when these mapping actions are applied. For more details, see the following section Operators.
Here is an example of JSON Transform configuration with mappings.
{
"mappings": [
{
"source_path": "post.id",
"destination_path": "id",
"type": "string"
},
{
"source_path": "post.created_at",
"destination_path": "doc_date",
"type": "date"
},
{
"source_path": "post.comment",
"destination_path": "content.body",
"type": "date",
"condition": {
"path": "type",
"operator": "eq",
"value_type": "string",
"value": "comment"
}
}
]
}
Operations
Operations are a key feature of the JSON Transformation component, allowing content manipulation. Available operations enable formatting, concatenation, mapping, content extraction, and more. All operations support conditions.
{
"mappings": [
{
"source_path": "post.id",
"destination_path": "id",
"type": "string"
},
{
"source_path": "post.created_at",
"destination_path": "doc_date",
"type": "date"
},
{
"source_path": "post.comment",
"destination_path": "content.body",
"type": "date",
"condition": {
"path": "type",
"operator": "eq",
"value_type": "string",
"value": "comment"
}
}
],
"operations": [
{
"stage": "transformation",
"name": "format",
"destination_path": "source.link",
"parameters": {
"fields": [
"post.id"
],
"format": "https://example.com/comments/{0}"
},
"condition": {
"path": "type",
"operator": "eq",
"value_type": "string",
"value": "comment"
}
}
]
}
concat
Join two or more field values using a separator.
{
"operations": [
{
"name": "concat",
"stage": "transformation",
"destination_path": "author.full_name",
"parameters": {
"fields": [
"author.first_name",
"author.last_name"
],
"separator": " "
}
}
]
}
format
Build a string value using a template with placeholders. The format is similar to a string interpolation in computer programming.
- In fields, one or more JSON source paths can be defined, and the values will be used to replace the format placeholders.
- The format represents the template with the placeholders {0}, {1} ...
{
"operations": [
{
"stage": "transformation",
"name": "format",
"destination_path": "source.link",
"parameters": {
"fields": [
"post.id"
],
"format": "https://example.com/comments/{0}"
},
"condition": {
"path": "type",
"operator": "eq",
"value_type": "string",
"value": "comment"
}
}
]
}
hash
Concat fields and return a hash representation. This operation is useful for cases where there is no unique ID in the content or when you need to hide sensitive information but, still need a hash for comparison.
- type: SHA-256 or MD5 (default)
- fields values to hash
{
"operations": [
{
"name": "hash",
"destination_path": "id",
"parameters": {
"type": "SHA-256", // Default MD5
"fields": [
"post.created_at",
"post.comment"
]
}
}
]
}
key_to_value
Map a JSON Object to a key-value-pair array.
{
"operations": [
{
"name": "key_to_value",
"destination_path": "attributes",
"parameters": {
"source": "attributes",
"value_name": "value",
"key_name": "key"
}
}
]
}
Input
{
"product_id": "12345",
"name": "Wireless Bluetooth Headphones",
"description": "High-quality wireless Bluetooth headphones with noise-cancellation, up to 30 hours of battery life, and built-in microphone.",
"attributes": {
"battery_life": "30 hours",
"has_microphone": true,
"colors": [
"Black",
"White",
"Blue"
],
"features": [
"Bluetooth 5.0",
"Noise-cancellation"
]
}
}
Output
{
"id": "12345",
"attributes": [
{
"key": "battery_life",
"value": "30 hours"
},
{
"key": "has_microphone",
"value": true
},
{
"key": "colors",
"value": [
"Black",
"White",
"Blue"
]
},
{
"key": "features",
"value": [
"Bluetooth 5.0",
"Noise-cancellation"
]
}
]
}
i18n_language_shorthand
The Internationalization Language Shorthand converts a language string to an ISO 639 language code.
Examples
two_letter_code
: es-ES -> esthree_letter_code
: es-ES -> esp
{
"operations": [
{
"name": "i18n_language_shorthand",
"parameters": {
"format": "two_letter_code",
"field": "lang"
},
"destination_path": "enrichment.language"
}
]
}
map
Perform different map actions based on different conditions.
alt
: the default alternative value if none of the conditions matchfrom
: regex condition from source value
Configuration
{
"operations": [
{
"stage": "transformation",
"name": "map",
"destination_path": "new_category",
"parameters": {
"source": "category",
"alt": "other",
"map": [
{
"from": "^arts",
"to": "entertainment"
},
{
"from": "^culture",
"to": "entertainment"
},
{
"from": "^entertainment",
"to": "entertainment"
}
]
}
}
]
}
Input
[
{
"title": "The Mona Lisa by Leonardo da Vinci, the world's most famous portrait, could get a room of its own in the Louvre, the museum's president said on Saturday.",
"category": "entertainment"
},
{
"title": "Reviving Traditions: Global Cultures Embrace Ancient Art Forms in Modern Times",
"category": "entertainment"
},
{
"title": "Streaming Wars Intensify: New Platforms and Big Releases Reshape Entertainment Landscape",
"category": "entertainment"
},
{
"title": "Electric Revolution: Automakers Race to Redefine the Future of Cars",
"category": "automobile"
}
]
Output
[
{
"title": "The Mona Lisa by Leonardo da Vinci, the world's most famous portrait, could get a room of its own in the Louvre, the museum's president said on Saturday.",
"category": "art",
"new_category": "other"
},
{
"title": "Reviving Traditions: Global Cultures Embrace Ancient Art Forms in Modern Times",
"category": "culture",
"new_category": "entertainment"
},
{
"title": "Streaming Wars Intensify: New Platforms and Big Releases Reshape Entertainment Landscape",
"category": "entertainment",
"new_category": "entertainment"
},
{
"title": "Electric Revolution: Automakers Race to Redefine the Future of Cars",
"category": "automobile",
"new_category": "other"
}
]
extract
Extract content that matches the regex. Use the format field to set the destination of every match.
Configuration
{
"operations": [
{
"stage": "transformation",
"name": "extract",
"destination_path": "content.author_related_post",
"parameters": {
"source_path": "url",
"regex": "@([^\/]+)",
"format": "{1}"
}
}
]
}
Input
{
"source": {
"link": "https://www.example.com/@mycat/comments/12345"
}
}
Output
{
"content.author_related_post": "mycat",
"source": {
"link": "https://www.example.com/@mycat/comments/12345"
}
}
Updated 7 days ago