GenAI Category Classifier
AI-powered text categorization into 12 media topics. Supports 38+ languages with confidence scores. Classify news, blogs & social media.
About
The GenAI Category Classifier is an AI-powered model designed to categorize text into predefined media topics. This classifier can process text from various sources such as news articles, blogs, and social media posts. It helps identify the dominant theme of the content, making it easier to organize and analyse large volumes of data. It supports multiple languages, making it adaptable for global markets, and can operate in real-time or batch-processing scenarios.
Available Categorizations
The GenAI Category Classifier categorizes content into 12 market interest categories:
- Politics & World Events
- Hobbies & Relaxation
- Family & Relationships
- Food & Beverages
- Technology & Manufacturing
- Sports & Entertainment
- Travel & Adventure
- Personal Finance & Careers
- Health & Wellness
- Weather & Environment
- Education & Learning
- Other
Adding to your Dynamic Pipeline
This component can be added to your Dynamic pipelines through the "GenAI Category Classifier" component. It requires the following fields for configuration:
- Destination Path (Required): The "enrichment.category" field holds the output from the GenAI Category Classifier. But you can map it to another field or create a new one. It contains a category label and a confidence score. The label will be one of the 12 categories from the above.
- Target Text (Required): The metadata field containing the input text for which category should be identified. By default, this is set to content.body, but any field containing short-form or long text can be used.
If the Gemini Model encounters safety issues with certain content, you will find that Gemini API failed to generate output.
The following example shows the dynamic pipeline configuration for the GenAI Category Classifier component. If you have the Unify as the previous step, you can use the example in the image.
In this example:
content.bodyfrom the input document is set as the “Target Text” for GenAI Category Classifierenrichment.categoryis set as the destination path for the output of the GenAI Category Classifier
Sample Example Output
"enrichment": {
"category": {
"label": "Politics & World Events",
"confidence": 0.8
}
}Compatible Languages
The GenAI Category Classifier supports content in multiple languages. When the input text is in a language other than English, the component automatically detects the language and performs the category classification accordingly. Category label will be provided in English. The language coverage is continuously improved as this component uses Gemini 2.5 Flash Lite in the back end. The language coverage is:
| Language | Language ID (ISO-639) |
|---|---|
| Arabic | ar |
| Bengali | bn |
| Bulgarian | bg |
| Chinese | zh |
| Croatian | hr |
| Czech | cs |
| Danish | da |
| Dutch | nl |
| English | en |
| Estonian | et |
| Finnish | fi |
| French | fr |
| German | de |
| Greek | el |
| Hebrew | iw |
| Hindi | hi |
| Hungarian | hu |
| Indonesian | id |
| Italian | it |
| Japanese | ja |
| Korean | ko |
| Latvian | lv |
| Lithuanian | lt |
| Norwegian | no |
| Polish | pl |
| Portuguese | pt |
| Romanian | ro |
| Russian | ru |
| Serbian | sr |
| Slovak | sk |
| Slovenian | sl |
| Spanish | es |
| Swahili | sw |
| Swedish | sv |
| Thai | th |
| Turkish | tr |
| Ukrainian | uk |
| Vietnamese | vi |
Updated 19 days ago
