Location Inference Models
Location Inference Models work to infer the location of origin of a media post.
Location Inference Models work to infer the location of the author of a piece of text content, by assessment and predicting on a number of parameters in the data.
Example Use Cases
- In conjunction with aggregations and sentiment, high-level assessments of sentiment towards a brand in a specific city could be delivered to a product's dashboard.
- Spanish language location Inference can give a country-level view of Spanish content rather than relying on keywords or language.
- Japanese language location Inference can give a country-level view of Japanese content rather than relying on keywords or language.
- Location Inference can give a more city-level view of content than relying on keywords or language.
- Location Inference can be used in its inverse to remove certain cities or countries from the results of content in a specific area.
Model Categories
We have two categories of Location Inference models, which are based on the conversational patterns present on the source:
- Conversational-Style Models
- Broadcast-Style Models
Within the two categories is a separate model per language.
Conversational-Style Models are trained on social content that has more text features. They require more text features to be present in the data for better prediction, and can predict on a more granular level.
Broadcast-Style Models are trained on social content lacking large amounts of surrounding context. However, they can be more limited in their coverage.
If you are using a Broadcast-style source, and require labels only present in Conversational-style Models, you can still use the Conversational-style Models for the increased coverage, however accuracy may be lower as a result.
Suggested Model Type for Example Sources:
This is only a suggestion and guide, not a comprehensive list of sources.
Conversational-Style Models | Broadcast-Style Models |
---|---|
Snapchat | |
Quora | |
Threads | TikTok |
Forums | Blogs |
Slack | |
Discord | |
Telegram | YouTube |
Support chatbot conversations | Stack Exchange |
P2P Marketplace listings (Craiglist, Kijiji, etc) | |
Tumblr | |
User Reviews | |
Email content |
Sources written in a formal style, such as news, reports, press releases, e-commerce listings, and other formally written data are best to use other location detection models. This is due to length, language, and de-personalization of writing patterns.
Available Location Inference Models
The following models are available:
Conversational-Style Models | Broadcast-Style Models |
---|---|
English (33 labels) | English (13 labels) |
Japanese (1 label) | Spanish (6 labels) |
Spanish (6 labels) | |
Arabic (3 labels) | |
French (3 labels) |
Data labels for each model are available below.
Labels
Conversational-Style Models
English | Japanese | Spanish | Arabic | French |
---|---|---|---|---|
(Additional City Level Available - See below) | Japan | Mexico | Egypt | France |
Puerto Rico | Other | Peru | Saudi Arabia | Canada |
Thailand | Colombia | Other | Other | |
Turkey | Argentina | |||
Colombia | Chile | |||
United States | Spain | |||
United Kingdom | Peru | |||
Canada | Other | |||
Australia | ||||
France | ||||
Germany | ||||
Mexico | ||||
Colombia | ||||
Saudi Arabia | ||||
India | ||||
United Arab Emirates | ||||
Belgium | ||||
Brazil | ||||
Switzerland | ||||
Czechia | ||||
Denmark | ||||
Egypt | ||||
Spain | ||||
Hungary | ||||
Italy | ||||
Ireland | ||||
Japan | ||||
Netherlands | ||||
Peru | ||||
Philippines | ||||
Qatar | ||||
Singapore | ||||
South Africa | ||||
Other |
Broadcast-Style Models
English | Japanese | Spanish |
---|---|---|
United States | Japan | Mexico |
United Kingdom | Other | Colombia |
Canada | Argentina | |
Australia | Chile | |
Brazil | Spain | |
Colombia | Peru | |
Turkey | Other | |
Thailand | ||
France | ||
Germany | ||
Mexico | ||
India | ||
New Zealand | ||
Other |
Metadata Output
This location inference classifier outputs three labels: city, region, and country of origin for a given text and an associated confidence score. If the confidence is under 0.5 or not in one of the trained labels or the output is unknown then the "Other " tag is applied.
The label would be one of the city names for the city and ISO 3166-1 code for region and country.
"location_inference": {
"label": "Detroit",
"confidence": 0.5681
},
"location_inference_region": {
"label": "MI",
"confidence": 0.8361
},
"location_inference_country": {
"label": "US",
"confidence": 0.8681
},
Source Specific Information
English Conversational-Style provides 61 cities' inferred locations for integrated data sources and 74 cities inferred locations for post-processing operations. The following cities, regions, and countries are available in English:
- Amsterdam, NL
- Anchorage, AK, US
- Atlanta, GA, US
- Austin, TX, US
- Baltimore, MD, US
- Barcelona, ES
- Berlin, DE
- Boston, MI, US
- Brussels, BE
- Budapest, HU
- Cairo, EG
- Cape Town, ZA
- Charleston, SC, US
- Charlotte, NC, US
- Cheyenne, WY, US
- Chicago, IL, US
- Columbus, OH, US
- Copenhagen, DK
- Dallas, TX, US
- Delhi, IN
- Denver, CO, US
- Des Moines, IA, US
- Detroit, MA, US
- Doha, QA
- Dubai, AE
- Dublin, IE
- El Paso, TX, US
- Fargo, ND, US
- Fort Worth, TX, US
- Houston, TX, US
- Huntsville, AL, US
- Indianapolis, IN, US
- Jacksonville, FL, US
- Johannesburg, ZA
- Kansas City, MO, US
- Las Vegas, NV, US
- Lima, PE
- London, UK
- Los Angeles, CA, US
- Louisville, KY, US
- Madrid, ES
- Manila, PH
- Melbourne, AU
- Memphis, TN, US
- Mexico City, MX
- Milwaukee, WI, US
- Minneapolis, MN, US
- Montreal, QC, CA
- Mumbai, IN
- Naples, IT
- New Orleans, LA, US
- New York, NY, US
- Newark, NJ, US
- Oklahoma City, OK, US
- Paris, FR
- Philadelphia, PA, US
- Phoenix, AZ, US
- Portland, OR, US
- Prague, CZ
- Riyadh, SA
- Sacramento, CA, US
- Salt Lake City, UT, US
- San Francisco, CA, US
- San Diego, CA, US
- Santa Fe, NM
- Seattle, WA, US
- Singapore, SG
- Sydney, AU
- Tokyo, JP
- Toronto, CA
- Virginia Beach, VA, US
- Washington DC, WA, US
- Wichita, KS, US
- Zurich, CH
Updated 3 months ago