Location Inference Models

Location Inference Models work to infer the location of origin of a media post.

Location Inference Models work to infer the location of the author of a piece of text content, by assessment and predicting on a number of parameters in the data.

Example Use Cases

  • In conjunction with aggregations and sentiment, high-level assessments of sentiment towards a brand in a specific city could be delivered to a product's dashboard.
  • Spanish language location Inference can give a country-level view of Spanish content rather than relying on keywords or language.
  • Japanese language location Inference can give a country-level view of Japanese content rather than relying on keywords or language.
  • Location Inference can give a more city-level view of content than relying on keywords or language.
  • Location Inference can be used in its inverse to remove certain cities or countries from the results of content in a specific area.

Model Categories

We have two categories of Location Inference models, which are based on the conversational patterns present on the source:

  • Conversational-Style Models
  • Broadcast-Style Models

Within the two categories is a separate model per language.

Conversational-Style Models are trained on social content that has more text features. They require more text features to be present in the data for better prediction, and can predict on a more granular level.

Broadcast-Style Models are trained on social content lacking large amounts of surrounding context. However, they can be more limited in their coverage.

📘

If you are using a Broadcast-style source, and require labels only present in Conversational-style Models, you can still use the Conversational-style Models for the increased coverage, however accuracy may be lower as a result.

Suggested Model Type for Example Sources:

This is only a suggestion and guide, not a comprehensive list of sources.

Conversational-Style ModelsBroadcast-Style Models
TwitterInstagram
RedditSnapchat
WeChatQuora
ThreadsTikTok
WhatsAppPinterest
ForumsBlogs
SlackLinkedin
DiscordFacebook
TelegramYouTube
Support chatbot conversationsStack Exchange
P2P Marketplace listings (Craiglist, Kijiji, etc)
Tumblr
User Reviews
Email content

🚧

Sources written in a formal style, such as news, reports, press releases, e-commerce listings, and other formally written data are best to use other location detection models. This is due to length, language, and de-personalization of writing patterns.

Available Location Inference Models

The following models are available:

Conversational-Style ModelsBroadcast-Style Models
English (33 labels)English (13 labels)
Japanese (1 label)Spanish (6 labels)
Spanish 6 labels)

Data labels for each model are available below.

Labels

Conversational-Style Models

EnglishJapaneseSpanish
(Additional City Level Available - See below)JapanMexico
Puerto RicoOtherPeru
ThailandColombia
TurkeyArgentina
ColombiaChile
United StatesSpain
United KingdomPeru
CanadaOther
Australia
France
Germany
Mexico
Colombia
Saudi Arabia
India
United Arab Emirates
Belgium
Brazil
Switzerland
Czechia
Denmark
Egypt
Spain
Hungary
Italy
Ireland
Japan
Netherlands
Peru
Philippines
Qatar
Singapore
South Africa
Other

Broadcast-Style Models

EnglishJapaneseSpanish
United StatesJapanMexico
United KingdomOtherColombia
CanadaArgentina
AustraliaChile
BrazilSpain
ColombiaPeru
TurkeyOther
Thailand
France
Germany
Mexico
India
New Zealand
Other

Metadata Output

This location inference classifier outputs three labels: city, region, and country of origin for a given text and an associated confidence score. If the confidence is under 0.5 or not in one of the trained labels or the output is unknown then the "Other " tag is applied.
The label would be one of the city names for the city and ISO 3166-1 code for region and country.

"location_inference": {
                    "label": "Detroit",	 
                    "confidence": 0.5681
                },
"location_inference_region": {
                    "label": "MI",	 
                    "confidence": 0.8361
                },
"location_inference_country": {
                    "label": "US",	 
                    "confidence": 0.8681
                },

Source Specific Information

English Conversational-Style provides 61 cities' inferred locations for integrated data sources and 74 cities inferred locations for post-processing operations. The following cities, regions, and countries are available in English:

  • Amsterdam, NL
  • Anchorage, AK, US
  • Atlanta, GA, US
  • Austin, TX, US
  • Baltimore, MD, US
  • Barcelona, ES
  • Berlin, DE
  • Boston, MI, US
  • Brussels, BE
  • Budapest, HU
  • Cairo, EG
  • Cape Town, ZA
  • Charleston, SC, US
  • Charlotte, NC, US
  • Cheyenne, WY, US
  • Chicago, IL, US
  • Columbus, OH, US
  • Copenhagen, DK
  • Dallas, TX, US
  • Delhi, IN
  • Denver, CO, US
  • Des Moines, IA, US
  • Detroit, MA, US
  • Doha, QA
  • Dubai, AE
  • Dublin, IE
  • El Paso, TX, US
  • Fargo, ND, US
  • Fort Worth, TX, US
  • Houston, TX, US
  • Huntsville, AL, US
  • Indianapolis, IN, US
  • Jacksonville, FL, US
  • Johannesburg, ZA
  • Kansas City, MO, US
  • Las Vegas, NV, US
  • Lima, PE
  • London, UK
  • Los Angeles, CA, US
  • Louisville, KY, US
  • Madrid, ES
  • Manila, PH
  • Melbourne, AU
  • Memphis, TN, US
  • Mexico City, MX
  • Milwaukee, WI, US
  • Minneapolis, MN, US
  • Montreal, QC, CA
  • Mumbai, IN
  • Naples, IT
  • New Orleans, LA, US
  • New York, NY, US
  • Newark, NJ, US
  • Oklahoma City, OK, US
  • Paris, FR
  • Philadelphia, PA, US
  • Phoenix, AZ, US
  • Portland, OR, US
  • Prague, CZ
  • Riyadh, SA
  • Sacramento, CA, US
  • Salt Lake City, UT, US
  • San Francisco, CA, US
  • San Diego, CA, US
  • Santa Fe, NM
  • Seattle, WA, US
  • Singapore, SG
  • Sydney, AU
  • Tokyo, JP
  • Toronto, CA
  • Virginia Beach, VA, US
  • Washington DC, WA, US
  • Wichita, KS, US
  • Zurich, CH