About

The AI Category Classifier is an AI-powered model designed to categorise text into predefined media topics. This classifier can process text from various sources such as news articles, blogs, and social media posts. It helps identify the dominant theme of the content, making it easier to organise and analyse large volumes of data. It supports multiple languages, making it adaptable for global markets, and can operate in real-time or batch-processing scenarios.

Available Categorizations

The AI Category Classifier is available for a number of different categorization taxonomies including: Market Interest Categorization, IPTC Categorization, and others.

Adding to your Dynamic Pipeline

This component can be added to your Dynamic pipelines through the "Category AI Classifier" component. It requires the following fields for configuration:

Destination Path (Required): The "enrichment.category " field holds the output from the Category AI Classifier. But you can map it to another field or create a new one. It contains a category label and a confidence score. The label will be one of the 13 categories from the above.
Target Text (Required): The metadata field containing the input text for which category should be identified. By default, this is set to content.body, but any field containing short-form or long text can be used.

If the Gemini Model encounters safety issues with certain content, you will find that Gemini API failed to generate output.

The following example shows the dynamic pipeline configuration for the Category AI Classifier component. If you have the Unify as the previous step, you can use the example in the image.

In this example:

content.body from the input document is set as the “Target Text” for AI category classifier
enrichment.category is set as the destination path for the output of the AI category classifier

Sample Example Output

Compatible Languages

The Micro Classifier supports content in multiple languages. When the input text is in a language other than English, the component automatically detects the language and performs the category classification accordingly. Category label will be provided in English. The language coverage is continuously improved as this component uses Google Gemini 2.0 Flash in the back end. Referring to https://ai.google.dev/gemini-api/docs/models#gemini-2.0-flash the language coverage is:

Language	Language ID (ISO-639)
Arabic	ar
Bengali	bn
Bulgarian	bg
Chinese	zh
Croatian	hr
Czech	cs
Danish	da
Dutch	nl
English	en
Estonian	et
Finnish	fi
French	fr
German	de
Greek	el
Hebrew	iw
Hindi	hi
Hungarian	hu
Indonesian	id
Italian	it
Japanese	ja
Korean	ko
Latvian	lv
Lithuanian	lt
Norwegian	no
Polish	pl
Portuguese	pt
Romanian	ro
Russian	ru
Serbian	sr
Slovak	sk
Slovenian	sl
Spanish	es
Swahili	sw
Swedish	sv
Thai	th
Turkish	tr
Ukrainian	uk
Vietnamese	vi