Location Inference Models
Location Inference Models work to infer the location of origin of a media post.
Location Inference Models work to infer the location of the author of a piece of text content, by assessment and predicting on a number of parameters in the data.
Example Use Cases
- In conjunction with aggregations and sentiment, high-level assessments of sentiment towards a brand in a specific city could be delivered to a product's dashboard.
- Spanish language location Inference can give a country-level view of Spanish content rather than relying on keywords or language.
- Japanese language location Inference can give a country-level view of Japanese content rather than relying on keywords or language.
- Location Inference can give a more city-level view of content than relying on keywords or language.
- Location Inference can be used in its inverse to remove certain cities or countries from the results of content in a specific area.
Model Categories
We have two categories of Location Inference models, which are based on the conversational patterns present on the source:
- Conversational-Style Models
- Broadcast-Style Models
Within the two categories is a separate model per language.
Conversational-Style Models are trained on social content that has more text features. They require more text features to be present in the data for better prediction, and can predict on a more granular level.
Broadcast-Style Models are trained on social content lacking large amounts of surrounding context. However, they can be more limited in their coverage.
If you are using a Broadcast-style source, and require labels only present in Conversational-style Models, you can still use the Conversational-style Models for the increased coverage, however accuracy may be lower as a result.
Suggested Model Type for Example Sources:
This is only a suggestion and guide, not a comprehensive list of sources.
Conversational-Style Models | Broadcast-Style Models |
---|---|
Snapchat | |
Quora | |
Threads | TikTok |
Forums | Blogs |
Slack | |
Discord | |
Telegram | YouTube |
Support chatbot conversations | Stack Exchange |
P2P Marketplace listings (Craiglist, Kijiji, etc) | |
Tumblr | |
User Reviews | |
Email content |
Sources written in a formal style, such as news, reports, press releases, e-commerce listings, and other formally written data are best to use other location detection models. This is due to length, language, and de-personalization of writing patterns.
Available Location Inference Models
The following models are available:
Conversational-Style Models | Broadcast-Style Models |
---|---|
English (33 labels) | English (13 labels) |
Japanese (1 label) | Spanish (6 labels) |
Spanish (6 labels) | |
Arabic (3 labels) | |
French (3 labels) |
Data labels for each model are available below.
Labels
Conversational-Style Models
English | Japanese | Spanish | Arabic | French |
---|---|---|---|---|
(Additional City Level Available - See below) | Japan | Mexico | Egypt | France |
Puerto Rico | Other | Peru | Saudi Arabia | Canada |
Thailand | Colombia | Other | Other | |
Turkey | Argentina | |||
Colombia | Chile | |||
United States | Spain | |||
United Kingdom | Peru | |||
Canada | Other | |||
Australia | ||||
France | ||||
Germany | ||||
Mexico | ||||
Colombia | ||||
Saudi Arabia | ||||
India | ||||
United Arab Emirates | ||||
Belgium | ||||
Brazil | ||||
Switzerland | ||||
Czechia | ||||
Denmark | ||||
Egypt | ||||
Spain | ||||
Hungary | ||||
Italy | ||||
Ireland | ||||
Japan | ||||
Netherlands | ||||
Peru | ||||
Philippines | ||||
Qatar | ||||
Singapore | ||||
South Africa | ||||
Other |
Broadcast-Style Models
English | Japanese | Spanish |
---|---|---|
United States | Japan | Mexico |
United Kingdom | Other | Colombia |
Canada | Argentina | |
Australia | Chile | |
Brazil | Spain | |
Colombia | Peru | |
Turkey | Other | |
Thailand | ||
France | ||
Germany | ||
Mexico | ||
India | ||
New Zealand | ||
Other |
Metadata Output
This location inference classifier outputs three labels: city, region, and country of origin for a given text and an associated confidence score. If the confidence is under 0.5 or not in one of the trained labels or the output is unknown then the "Other" tag is applied.
The label would be one of the city names for the city and ISO 3166-1 code for region and country.
"location_inference": {
"label": "Detroit",
"confidence": 0.5681
},
"location_inference_region": {
"label": "MI",
"confidence": 0.8361
},
"location_inference_country": {
"label": "US",
"confidence": 0.8681
},
Source Specific Information
English Conversational-Style provides 61 cities' inferred locations for integrated data sources and 74 cities inferred locations for post-processing operations. The following cities, regions, and countries are available in English:
- Amsterdam, NL
- Anchorage, AK, US
- Atlanta, GA, US
- Austin, TX, US
- Baltimore, MD, US
- Barcelona, ES
- Berlin, DE
- Boston, MI, US
- Brussels, BE
- Budapest, HU
- Cairo, EG
- Cape Town, ZA
- Charleston, SC, US
- Charlotte, NC, US
- Cheyenne, WY, US
- Chicago, IL, US
- Columbus, OH, US
- Copenhagen, DK
- Dallas, TX, US
- Delhi, IN
- Denver, CO, US
- Des Moines, IA, US
- Detroit, MA, US
- Doha, QA
- Dubai, AE
- Dublin, IE
- El Paso, TX, US
- Fargo, ND, US
- Fort Worth, TX, US
- Houston, TX, US
- Huntsville, AL, US
- Indianapolis, IN, US
- Jacksonville, FL, US
- Johannesburg, ZA
- Kansas City, MO, US
- Las Vegas, NV, US
- Lima, PE
- London, UK
- Los Angeles, CA, US
- Louisville, KY, US
- Madrid, ES
- Manila, PH
- Melbourne, AU
- Memphis, TN, US
- Mexico City, MX
- Milwaukee, WI, US
- Minneapolis, MN, US
- Montreal, QC, CA
- Mumbai, IN
- Naples, IT
- New Orleans, LA, US
- New York, NY, US
- Newark, NJ, US
- Oklahoma City, OK, US
- Paris, FR
- Philadelphia, PA, US
- Phoenix, AZ, US
- Portland, OR, US
- Prague, CZ
- Riyadh, SA
- Sacramento, CA, US
- Salt Lake City, UT, US
- San Francisco, CA, US
- San Diego, CA, US
- Santa Fe, NM
- Seattle, WA, US
- Singapore, SG
- Sydney, AU
- Tokyo, JP
- Toronto, CA
- Virginia Beach, VA, US
- Washington DC, WA, US
- Wichita, KS, US
- Zurich, CH
Updated 9 days ago