Language Detection (Google Service)

Detecting language used in any field of given inputs

About

The Language Detection (Google Service) component uses Google Translate Service to detect the language used in any field from the given input. The detected language is presented in a two-letter ISO 3166-1 alpha-2 code format (lowercase). This component supports both real-time and batch-processing workflows.

Adding to your Dynamic Pipeline

This component can be added to your Dynamic pipelines through the "Language Detection (Google Service)" component. It requires the following fields for configuration:

  • Destination Path (Required): The JSON path where the detected language code will be output. By default, this is set to enrichment.language. The field can be an existing field, or the component can create a new field.
  • Source Path (Required): The JSON path of the input field that Language Detection will use as a source. By default, this is set to content.body, but any field can be chosen.
  • Filter Conditions (Optional): Filter conditions to apply before detecting a document's language. See JSON Conditions page for more information.

Dynamic Pipeline Example Configuration

The following example shows a dynamic pipeline configuration for the Language Detection component:

  • enrichment.language is set as the destination path for the detected language code
  • content.body from the input document is set as the Source Path for language detection

The language coverage is continuously improved as this component uses Google Translate API in the back end. Referring to https://cloud.google.com/translate/docs/languages, the language coverage includes:

LanguageLanguage ID (ISO 3166-1 alpha-2)
Afrikaansaf
Albaniansq
Amharicam
Arabicar
Armenianhy
Assameseas
Aymaraay
Azerbaijaniaz
Bambarabm
Basqueeu
Belarusianbe
Bengalibn
Bhojpuribho
Bosnianbs
Bulgarianbg
Catalanca
Cebuanoceb
Chinese (Simplified)zh-CN
Chinese (Traditional)zh-TW
Corsicanco
Croatianhr
Czechcs
Danishda
Dhivehidv
Dogridoi
Dutchnl
Englishen
Esperantoeo
Estonianet
Eweee
Filipino (Tagalog)fil
Finnishfi
Frenchfr
Frisianfy
Galiciangl
Georgianka
Germande
Greekel
Guaranign
Gujaratigu
Haitian Creoleht
Hausaha
Hawaiianhaw
Hebrewhe
Hindihi
Hmonghmn
Hungarianhu
Icelandicis
Igboig
Ilocanoilo
Indonesianid
Irishga
Italianit
Japaneseja
Javanesejw
Kannadakn
Kazakhkk
Khmerkm
Kinyarwandarw
Konkanigom
Koreanko
Kriokri
Kurdishku
Kurdish (Sorani)ckb
Kyrgyzky
Laolo
Latinla
Latvianlv
Lingalaln
Lithuanianlt
Lugandalg
Luxembourgishlb
Macedonianmk
Maithilimai
Malagasymg
Malayms
Malayalamml
Maltesemt
Maorimi
Marathimr
Meiteilon (Manipuri)mni-Mtei
Mizolus
Mongolianmn
Myanmar (Burmese)my
Nepaline
Norwegianno
Nyanja (Chichewa)ny
Odia (Oriya)or
Oromoom
Pashtops
Persianfa
Polishpl
Portuguesept
Punjabipa
Quechuaqu
Romanianro
Russianru
Samoansm
Sanskritsa
Scots Gaelicgd
Sepedinso
Serbiansr
Sesothost
Shonasn
Sindhisd
Sinhala (Sinhalese)si
Slovaksk
Sloveniansl
Somaliso
Spanishes
Sundanesesu
Swahilisw
Swedishsv
Tagalog (Filipino)tl
Tajiktg
Tamilta
Tatartt
Telugute
Thaith
Tigrinyati
Tsongats
Turkishtr
Turkmentk
Twi (Akan)ak
Ukrainianuk
Urduur
Uyghurug
Uzbekuz
Vietnamesevi
Welshcy
Xhosaxh
Yiddishyi
Yorubayo
Zuluzu

Usage in Search API

This operation allows a user to specify the destination field and source field.

Example Output

{
    "query": {
        ...
    },
    "operations": [
        {
            "name": "detect_language",
            "destination_path": "enrichment.language",
            "parameters": {
                "source_path": "content.body"
            }
        }
    ]
}