Get Started

JSON Schema Transformer

The JSON Transform feature enables modification and enrichment of incoming content within the pipeline. This component allows the application of mappings, operations, and conditions to customize and transform the pipeline data.

Mappings

Mappings can be used to rename JSON properties, remove unnecessary content, or add new data. Expression conditions control when these mapping actions are applied. For more details, see the following section Operators.

Here is an example of JSON Transform configuration with mappings.

{
  "mappings": [
    {
      "source_path": "post.id",
      "destination_path": "id",
      "type": "string"
    },
    {
      "source_path": "post.created_at",
      "destination_path": "doc_date",
      "type": "date"
    },
    {
      "source_path": "post.comment",
      "destination_path": "content.body",
      "type": "date",
      "condition": {
        "path": "type",
        "operator": "eq",
        "value_type": "string",
        "value": "comment"
      }
    }
  ]
}

Operations

Operations are a key feature of the JSON Transformation component, allowing content manipulation. Available operations enable formatting, concatenation, mapping, content extraction, and more. All operations support conditions.

{
  "mappings": [
    {
      "source_path": "post.id",
      "destination_path": "id",
      "type": "string"
    },
    {
      "source_path": "post.created_at",
      "destination_path": "doc_date",
      "type": "date"
    },
    {
      "source_path": "post.comment",
      "destination_path": "content.body",
      "type": "date",
      "condition": {
        "path": "type",
        "operator": "eq",
        "value_type": "string",
        "value": "comment"
      }
    }
  ],
  "operations": [
    {
      "stage": "transformation",
      "name": "format",
      "destination_path": "source.link",
      "parameters": {
        "fields": [
          "post.id"
        ],
        "format": "https://example.com/comments/{0}"
      },
      "condition": {
        "path": "type",
        "operator": "eq",
        "value_type": "string",
        "value": "comment"
      }
    }
  ]
}

concat

Join two or more field values using a separator.

{
    "operations": [
        {
            "name": "concat",
            "stage": "transformation",
            "destination_path": "author.full_name",
            "parameters": {
                "fields": [
                    "author.first_name",
                    "author.last_name"
                ],
                "separator": " "
            }
        }
    ]
}

format

Build a string value using a template with placeholders. The format is similar to a string interpolation in computer programming.

  • In fields, one or more JSON source paths can be defined, and the values will be used to replace the format placeholders.
  • The format represents the template with the placeholders {0}, {1} ...
{
    "operations": [
        {
            "stage": "transformation",
            "name": "format",
            "destination_path": "source.link",
            "parameters": {
                "fields": [
                    "post.id"
                ],
                "format": "https://example.com/comments/{0}"
            },
            "condition": {
                "path": "type",
                "operator": "eq",
                "value_type": "string",
                "value": "comment"
            }
        }
    ]
}

hash

Concat fields and return a hash representation. This operation is useful for cases where there is no unique ID in the content or when you need to hide sensitive information but, still need a hash for comparison.

  • type: SHA-256 or MD5 (default)
  • fields values to hash
{
    "operations": [
        {
            "name": "hash",
            "destination_path": "id",
            "parameters": {
              "type": "SHA-256", // Default MD5
              "fields": [
                "post.created_at",
                "post.comment"
              ]
            }
          }
    ]
}

key_to_value

Map a JSON Object to a key-value-pair array.

{
  "operations": [
    {
      "name": "key_to_value",
      "destination_path": "attributes",
      "parameters": {
        "source": "attributes",
        "value_name": "value",
        "key_name": "key"
      }
    }
  ]
}

Input

{
  "product_id": "12345",
  "name": "Wireless Bluetooth Headphones",
  "description": "High-quality wireless Bluetooth headphones with noise-cancellation, up to 30 hours of battery life, and built-in microphone.",
  "attributes": {
    "battery_life": "30 hours",
    "has_microphone": true,
    "colors": [
      "Black",
      "White",
      "Blue"
    ],
    "features": [
      "Bluetooth 5.0",
      "Noise-cancellation"
    ]
  }
}

Output

{
  "id": "12345",
  "attributes": [
    {
      "key": "battery_life",
      "value": "30 hours"
    },
    {
      "key": "has_microphone",
      "value": true
    },
    {
      "key": "colors",
      "value": [
        "Black",
        "White",
        "Blue"
      ]
    },
    {
      "key": "features",
      "value": [
        "Bluetooth 5.0",
        "Noise-cancellation"
      ]
    }
  ]
}

i18n_language_shorthand

The Internationalization Language Shorthand converts a language string to an ISO 639 language code.

Examples

  • two_letter_code: es-ES -> es
  • three_letter_code: es-ES -> esp
{
    "operations": [
        {
            "name": "i18n_language_shorthand",
            "parameters": {
              "format": "two_letter_code",
              "field": "lang"
            },
            "destination_path": "enrichment.language"
          }
    ]
}

map

Perform different map actions based on different conditions.

  • alt: the default alternative value if none of the conditions match
  • from: regex condition from source value

Configuration

{
    "operations": [
        {
            "stage": "transformation",
            "name": "map",
            "destination_path": "new_category",
            "parameters": {
                "source": "category",
                "alt": "other",
                "map": [
                    {
                        "from": "^arts",
                        "to": "entertainment"
                    },
                    {
                        "from": "^culture",
                        "to": "entertainment"
                    },
                    {
                        "from": "^entertainment",
                        "to": "entertainment"
                    }
                ]
            }
        }
    ]
}

Input

[
    {
        "title": "The Mona Lisa by Leonardo da Vinci, the world's most famous portrait, could get a room of its own in the Louvre, the museum's president said on Saturday.",
        "category": "entertainment"
    },
    {
        "title": "Reviving Traditions: Global Cultures Embrace Ancient Art Forms in Modern Times",
        "category": "entertainment"
    },
    {
        "title": "Streaming Wars Intensify: New Platforms and Big Releases Reshape Entertainment Landscape",
        "category": "entertainment"
    },
    {
        "title": "Electric Revolution: Automakers Race to Redefine the Future of Cars",
        "category": "automobile"
    }
]

Output

[
    {
        "title": "The Mona Lisa by Leonardo da Vinci, the world's most famous portrait, could get a room of its own in the Louvre, the museum's president said on Saturday.",
        "category": "art",
        "new_category": "other"
    },
    {
        "title": "Reviving Traditions: Global Cultures Embrace Ancient Art Forms in Modern Times",
        "category": "culture",
        "new_category": "entertainment"
    },
    {
        "title": "Streaming Wars Intensify: New Platforms and Big Releases Reshape Entertainment Landscape",
        "category": "entertainment",
        "new_category": "entertainment"
    },
    {
        "title": "Electric Revolution: Automakers Race to Redefine the Future of Cars",
        "category": "automobile",
        "new_category": "other"
    }
]

extract

Extract content that matches the regex. Use the format field to set the destination of every match.

Configuration

{
  "operations": [
    {
      "stage": "transformation",
      "name": "extract",
      "destination_path": "content.author_related_post",
      "parameters": {
        "source_path": "url",
        "regex": "@([^\/]+)",
        "format": "{1}"
      }
    }
  ]
}

Input

{
    "source": {
        "link": "https://www.example.com/@mycat/comments/12345"
    }
}

Output

{
    "content.author_related_post": "mycat",
    "source": {
        "link": "https://www.example.com/@mycat/comments/12345"
    }
}