AI Category Classifier
Use AI to classify news, blogs, and social media into predefined categories for better content organization
About
The AI Category Classifier is an AI-powered model designed to categorise text into predefined media topics. This classifier can process text from various sources such as news articles, blogs, and social media posts. It helps identify the dominant theme of the content, making it easier to organise and analyse large volumes of data. It supports multiple languages, making it adaptable for global markets, and can operate in real-time or batch-processing scenarios.
Available Categorizations
The AI Category Classifier is available for a number of different categorization taxonomies including: Market Interest Categorization, IPTC Categorization, and others.
Adding to your Dynamic Pipeline
This component can be added to your Dynamic pipelines through the "Category AI Classifier" component. It requires the following fields for configuration:
- Destination Path (Required): The "enrichment.category " field holds the output from the Category AI Classifier. But you can map it to another field or create a new one. It contains a category label and a confidence score. The label will be one of the 13 categories from the above.
- Target Text (Required): The metadata field containing the input text for which category should be identified. By default, this is set to content.body, but any field containing short-form or long text can be used.
If the Gemini Model encounters safety issues with certain content, you will find that Gemini API failed to generate output.
The following example shows the dynamic pipeline configuration for the Category AI Classifier component. If you have the Unify as the previous step, you can use the example in the image.
In this example:
-
content.body
from the input document is set as the “Target Text” for AI category classifier -
enrichment.category
is set as the destination path for the output of the AI category classifier
Sample Example Output
Compatible Languages
The Micro Classifier supports content in multiple languages. When the input text is in a language other than English, the component automatically detects the language and performs the category classification accordingly. Category label will be provided in English. The language coverage is continuously improved as this component uses Google Gemini 1.5 Flash in the back end. Referring to https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-flash the language coverage is:
Language | Language ID (ISO-639) |
---|---|
Arabic | ar |
Bengali | bn |
Bulgarian | bg |
Chinese | zh |
Croatian | hr |
Czech | cs |
Danish | da |
Dutch | nl |
English | en |
Estonian | et |
Finnish | fi |
French | fr |
German | de |
Greek | el |
Hebrew | iw |
Hindi | hi |
Hungarian | hu |
Indonesian | id |
Italian | it |
Japanese | ja |
Korean | ko |
Latvian | lv |
Lithuanian | lt |
Norwegian | no |
Polish | pl |
Portuguese | pt |
Romanian | ro |
Russian | ru |
Serbian | sr |
Slovak | sk |
Slovenian | sl |
Spanish | es |
Swahili | sw |
Swedish | sv |
Thai | th |
Turkish | tr |
Ukrainian | uk |
Vietnamese | vi |
Updated 5 days ago