AI Category Classifier

Use AI to classify news, blogs, and social media into predefined categories for better content organization

About

The AI Category Classifier is an AI-powered model designed to categorise text into predefined media topics. This classifier can process text from various sources such as news articles, blogs, and social media posts. It helps identify the dominant theme of the content, making it easier to organise and analyse large volumes of data. It supports multiple languages, making it adaptable for global markets, and can operate in real-time or batch-processing scenarios.

Available Categorizations

The AI Category Classifier is available for a number of different categorization taxonomies including: Market Interest Categorization, IPTC Categorization, and others.

Adding to your Dynamic Pipeline

This component can be added to your Dynamic pipelines through the "Category AI Classifier" component. It requires the following fields for configuration:

  • Destination Path (Required): The "enrichment.category " field holds the output from the Category AI Classifier. But you can map it to another field or create a new one. It contains a category label and a confidence score. The label will be one of the 13 categories from the above.
  • Target Text (Required): The metadata field containing the input text for which category should be identified. By default, this is set to content.body, but any field containing short-form or long text can be used.

If the Gemini Model encounters safety issues with certain content, you will find that Gemini API failed to generate output.

The following example shows the dynamic pipeline configuration for the Category AI Classifier component. If you have the Unify as the previous step, you can use the example in the image.

In this example:

  • content.body from the input document is set as the “Target Text” for AI category classifier

  • enrichment.category is set as the destination path for the output of the AI category classifier


Sample Example Output

Compatible Languages

The Micro Classifier supports content in multiple languages. When the input text is in a language other than English, the component automatically detects the language and performs the category classification accordingly. Category label will be provided in English. The language coverage is continuously improved as this component uses Google Gemini 1.5 Flash in the back end. Referring to https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-flash the language coverage is:

LanguageLanguage ID (ISO-639)
Arabicar
Bengalibn
Bulgarianbg
Chinesezh
Croatianhr
Czechcs
Danishda
Dutchnl
Englishen
Estonianet
Finnishfi
Frenchfr
Germande
Greekel
Hebrewiw
Hindihi
Hungarianhu
Indonesianid
Italianit
Japaneseja
Koreanko
Latvianlv
Lithuanianlt
Norwegianno
Polishpl
Portuguesept
Romanianro
Russianru
Serbiansr
Slovaksk
Sloveniansl
Spanishes
Swahilisw
Swedishsv
Thaith
Turkishtr
Ukrainianuk
Vietnamesevi