GenAI Category Classifier

AI-powered text categorization into 12 media topics. Supports 38+ languages with confidence scores. Classify news, blogs & social media.

About

The GenAI Category Classifier is an AI-powered model designed to categorize text into predefined media topics. This classifier can process text from various sources such as news articles, blogs, and social media posts. It helps identify the dominant theme of the content, making it easier to organize and analyse large volumes of data. It supports multiple languages, making it adaptable for global markets, and can operate in real-time or batch-processing scenarios.

Available Categorizations

The GenAI Category Classifier categorizes content into 12 market interest categories:

  • Politics & World Events
  • Hobbies & Relaxation
  • Family & Relationships
  • Food & Beverages
  • Technology & Manufacturing
  • Sports & Entertainment
  • Travel & Adventure
  • Personal Finance & Careers
  • Health & Wellness
  • Weather & Environment
  • Education & Learning
  • Other

Adding to your Dynamic Pipeline

This component can be added to your Dynamic pipelines through the "GenAI Category Classifier" component. It requires the following fields for configuration:

  • Destination Path (Required): The "enrichment.category" field holds the output from the GenAI Category Classifier. But you can map it to another field or create a new one. It contains a category label and a confidence score. The label will be one of the 12 categories from the above.
  • Target Text (Required): The metadata field containing the input text for which category should be identified. By default, this is set to content.body, but any field containing short-form or long text can be used.

If the Gemini Model encounters safety issues with certain content, you will find that Gemini API failed to generate output.

The following example shows the dynamic pipeline configuration for the GenAI Category Classifier component. If you have the Unify as the previous step, you can use the example in the image.

In this example:

  • content.body from the input document is set as the “Target Text” for GenAI Category Classifier
  • enrichment.category is set as the destination path for the output of the GenAI Category Classifier

Sample Example Output

"enrichment": {
    "category": {
        "label": "Politics & World Events", 
        "confidence": 0.8
    }
}

Compatible Languages

The GenAI Category Classifier supports content in multiple languages. When the input text is in a language other than English, the component automatically detects the language and performs the category classification accordingly. Category label will be provided in English. The language coverage is continuously improved as this component uses Gemini 2.5 Flash Lite in the back end. The language coverage is:

LanguageLanguage ID (ISO-639)
Arabicar
Bengalibn
Bulgarianbg
Chinesezh
Croatianhr
Czechcs
Danishda
Dutchnl
Englishen
Estonianet
Finnishfi
Frenchfr
Germande
Greekel
Hebrewiw
Hindihi
Hungarianhu
Indonesianid
Italianit
Japaneseja
Koreanko
Latvianlv
Lithuanianlt
Norwegianno
Polishpl
Portuguesept
Romanianro
Russianru
Serbiansr
Slovaksk
Sloveniansl
Spanishes
Swahilisw
Swedishsv
Thaith
Turkishtr
Ukrainianuk
Vietnamesevi