Location Inference Models

Location Inference Models work to infer the location of origin of a media post.

Location Inference Models work to infer the location of the author of a piece of text content, by assessment and predicting on a number of parameters in the data.

Example Use Cases

In conjunction with aggregations and sentiment, high-level assessments of sentiment towards a brand in a specific city could be delivered to a product's dashboard.
Spanish language location Inference can give a country-level view of Spanish content rather than relying on keywords or language.
Japanese language location Inference can give a country-level view of Japanese content rather than relying on keywords or language.
Location Inference can give a more city-level view of content than relying on keywords or language.
Location Inference can be used in its inverse to remove certain cities or countries from the results of content in a specific area.

Model Categories

We have two categories of Location Inference models, which are based on the conversational patterns present on the source:

Conversational-Style Models
Broadcast-Style Models

Within the two categories is a separate model per language.

Conversational-Style Models are trained on social content that has more text features. They require more text features to be present in the data for better prediction, and can predict on a more granular level.

Broadcast-Style Models are trained on social content lacking large amounts of surrounding context. However, they can be more limited in their coverage.

📘
If you are using a Broadcast-style source, and require labels only present in Conversational-style Models, you can still use the Conversational-style Models for the increased coverage, however accuracy may be lower as a result.

Suggested Model Type for Example Sources:

This is only a suggestion and guide, not a comprehensive list of sources.

Conversational-Style Models	Broadcast-Style Models
Twitter	Instagram
Reddit	Snapchat
WeChat	Quora
Threads	TikTok
WhatsApp	Pinterest
Forums	Blogs
Slack	Linkedin
Discord	Facebook
Telegram	YouTube
Support chatbot conversations	Stack Exchange
	P2P Marketplace listings (Craiglist, Kijiji, etc)
	Tumblr
	User Reviews
	Email content

🚧
Sources written in a formal style, such as news, reports, press releases, e-commerce listings, and other formally written data are best to use other location detection models. This is due to length, language, and de-personalization of writing patterns.

Available Location Inference Models

The following models are available:

Conversational-Style Models	Broadcast-Style Models
English (33 labels)	English (13 labels)
Japanese (2 labels)	Spanish (6 labels)
Spanish (6 labels)	Japanese (2 labels)
Arabic (3 labels)
French (3 labels)

Data labels for each model are available below.

Labels

Conversational-Style Models

English	Japanese	Spanish	Arabic	French
(Additional City Level Available - See below)	Japan	Mexico	Egypt	France
Puerto Rico	Other	Peru	Saudi Arabia	Canada
Thailand		Colombia	Other	Other
Turkey		Argentina
Colombia		Chile
United States		Spain
United Kingdom		Peru
Canada		Other
Australia
France
Germany
Mexico
Colombia
Saudi Arabia
India
United Arab Emirates
Belgium
Brazil
Switzerland
Czechia
Denmark
Egypt
Spain
Hungary
Italy
Ireland
Japan
Netherlands
Peru
Philippines
Qatar
Singapore
South Africa
Other

Broadcast-Style Models

English	Japanese	Spanish
United States	Japan	Mexico
United Kingdom	Other	Colombia
Canada		Argentina
Australia		Chile
Brazil		Spain
Colombia		Peru
Turkey		Other
Thailand
France
Germany
Mexico
India
New Zealand
Other

Metadata Output

This location inference classifier outputs three labels: city, region, and country of origin for a given text and an associated confidence score. If the confidence is under 0.5 or not in one of the trained labels or the output is unknown then the "Other" tag is applied.
The label would be one of the city names for the city and ISO 3166-1 code for region and country.

"location_inference": {
                    "label": "Detroit",	 
                    "confidence": 0.5681
                },
"location_inference_region": {
                    "label": "MI",	 
                    "confidence": 0.8361
                },
"location_inference_country": {
                    "label": "US",	 
                    "confidence": 0.8681
                },

Source Specific Information

English Conversational-Style provides 61 cities' inferred locations for integrated data sources and 74 cities inferred locations for post-processing operations. The following cities, regions, and countries are available in English:

Amsterdam, NL
Anchorage, AK, US
Atlanta, GA, US
Austin, TX, US
Baltimore, MD, US
Barcelona, ES
Berlin, DE
Boston, MI, US
Brussels, BE
Budapest, HU
Cairo, EG
Cape Town, ZA
Charleston, SC, US
Charlotte, NC, US
Cheyenne, WY, US
Chicago, IL, US
Columbus, OH, US
Copenhagen, DK
Dallas, TX, US
Delhi, IN
Denver, CO, US
Des Moines, IA, US
Detroit, MA, US
Doha, QA
Dubai, AE
Dublin, IE
El Paso, TX, US
Fargo, ND, US
Fort Worth, TX, US
Houston, TX, US
Huntsville, AL, US
Indianapolis, IN, US
Jacksonville, FL, US
Johannesburg, ZA
Kansas City, MO, US
Las Vegas, NV, US
Lima, PE
London, UK
Los Angeles, CA, US
Louisville, KY, US
Madrid, ES
Manila, PH
Melbourne, AU
Memphis, TN, US
Mexico City, MX
Milwaukee, WI, US
Minneapolis, MN, US
Montreal, QC, CA
Mumbai, IN
Naples, IT
New Orleans, LA, US
New York, NY, US
Newark, NJ, US
Oklahoma City, OK, US
Paris, FR
Philadelphia, PA, US
Phoenix, AZ, US
Portland, OR, US
Prague, CZ
Riyadh, SA
Sacramento, CA, US
Salt Lake City, UT, US
San Francisco, CA, US
San Diego, CA, US
Santa Fe, NM
Seattle, WA, US
Singapore, SG
Sydney, AU
Tokyo, JP
Toronto, CA
Virginia Beach, VA, US
Washington DC, WA, US
Wichita, KS, US
Zurich, CH

Updated 5 months ago