Language Detection (Datastreamer)
Detecting language from any field of a given input
About
The language detection component is designed to detect the language used in any field from the given input. The detected language will be presented in the form of two-letter short form (ISO 3166-1 alpha-2 code). It should be noted that we can use this component in both real-time and batch-processing manner.
Adding to your Dynamic Pipeline
This Component can be added to your Dynamic pipelines through the "Language Detect" component. It requires the following fields for configuration:
- Destination Path (Required): the metadata field in which the ISO-3166-1 alpha-2 code will be outputted to. This will be the detected language. The field can be an existing field, or the component can create a new field for this functionality.
- Main (Required): This is the input field that the Language Detection will use as a source. By default, the content.body field will be used, however any field can be chosen as an output.
Compatible Languages
The coverage is as follows which includes 144 different languages.
Language | Language ID (ISO 3166-1 alpha-2 code) |
---|---|
Afar | AA |
Afrikaans | AF |
Akan | AK |
Albanian | SQ |
Amharic | AM |
Arabic | AR |
Armenian | HY |
Assamese | AS |
Aymara | AY |
Azerbaijani | AZ |
Bashkir | BA |
Basque | EU |
Belarusian | BE |
Bengali | BN |
Bihari | BH |
Bokmal | NB |
Bosnian | BS |
Bulgarian | BG |
Burmese | MY |
Catalan | CA |
Chinese | ZH |
Chinese | ZH-HANT |
Corsican | CO |
Croatian | HR |
Czech | CS |
Danish | DA |
Dhivehi | DV |
Dutch | NL |
English | EN |
Esperanto | EO |
Estonian | ET |
Ewe | EE |
Fijian | FJ |
Finnish | FI |
French | FR |
Frisian | FY |
Galician | GL |
Ganda | LG |
Georgian | KA |
German | DE |
Greek | EL |
Guarani | GN |
Gujarati | GU |
Hausa | HA |
Hebrew | HE |
Hindi | HI |
Hungarian | HU |
Icelandic | IS |
Igbo | IG |
Indonesian | ID |
Interlingua | IA |
Interlingue | IE |
Inupiak | IK |
Irish | GA |
Italian | IT |
Japanese | JA |
Javanese | JV |
Kannada | KN |
Kashmiri | KS |
Kazakh | KK |
Khmer | KM |
Kinyarwanda | RW |
Korean | KO |
Kurdish | KU |
Kyrgyz | KY |
Laothian | LO |
Latin | LA |
Latvian | LV |
Lingala | LN |
Lithuanian | LT |
Luxembourgish | LB |
Macedonian | MK |
Malagasy | MG |
Malay | MS |
Malayalam | ML |
Maltese | MT |
Maori | MI |
Marathi | MR |
Mongolian | MN |
Montenegrin | CNR |
Ndebele | NR |
Nepali | NE |
Norwegian | NO |
Norwegian_n | NN |
Nyanja | NY |
Occitan | OC |
Oriya | OR |
Oromo | OM |
Ossetian | OS |
Pashto | PS |
Persian | FA |
Polish | PL |
Portuguese | PT |
Punjabi | PA |
Quechua | QU |
Rhaeto_romance | RM |
Romanian | RO |
Rundi | RN |
Russian | RU |
Samoan | SM |
Sango | SG |
Sanskrit | SA |
Scots_gaelic | GD |
Serbian | SR |
Sesotho | ST |
Shona | SN |
Sindhi | SD |
Sinhalese | SI |
Siswant | SS |
Slovak | SK |
Slovenian | SL |
Somali | SO |
Spanish | ES |
Sundanese | SU |
Swahili | SW |
Swedish | SV |
Tagalog | TL |
Tajik | TG |
Tamil | TA |
Tatar | TT |
Telugu | TE |
Thai | TH |
Tibetan | BO |
Tigrinya | TI |
Tonga | TO |
Tsonga | TS |
Tswana | TN |
Turkish | TR |
Turkmen | TK |
Twi | TW |
Uighur | UG |
Ukrainian | UK |
Urdu | UR |
Uzbek | UZ |
Venda | VE |
Vietnamese | VI |
Volapuk | VO |
Welsh | CY |
Wolof | WO |
Xhosa | XH |
Yiddish | YI |
Yoruba | YO |
Zhuang | ZA |
Zulu | ZU |
Usage in Search API
This Operation allows a user to specify the destination field, source fields, and separator.
{
"query": {
...
},
"operations": [
{
"name": "langdetect",
"destination_path": "operations.language",
"parameters": {
"main": "content.body"
}
}
]
}
Updated 30 days ago