WebSightLine Threads
WebSightLine (WSL) Threads is a high-sampling of near-time public Threads content.
Documentation AccuracyWe try to ensure every component documentation is as accurate as possible. However as we do not manage 3rd parties product features and documentation, some 3rd party documentation may not be accurate at the time of the last update. Thank you for your understanding.
The WebSightLine Threads component provides a live feed with millions of Threads posts and comments each day.
New to Datastreamer? Start here.
Unify SchemaThis data source already use Unify Schema.
How to use?
The WebSightLine Threads is powered by the Jobs System, when interacting with the component you have the option to define your jobs queries.
Search Queries
Filters
Available filters for WebSightLine Threads can be found in the table below:
| Filter Name | Description |
|---|---|
| query | List of keywords or a phrase to search |
| max_documents | Set a limit for the number of posts that will be fetched for the search. |
The Lucene Query is supported for this component in the query field. Here are some of the basics queries that you can try:
Keywords:
cats
Fields:
title:lucene
Phrases:
"apache lucene"
Wildcards:
tes\*
Boolean operators:
cats OR dogs
Examples
Search for cats or dogs
Query cats or dogs every 6 hours:
You also have the option to use the API. You can use the Code button to extract this example:
curl --location 'https://dev.api.platform.datastreamer.io/api/pipelines/{PIPELINE_ID}/components/{COMPONENT_ID}/jobs?ready=true' \
--header 'apikey: <your-api-key>' \
--header 'Content-Type: application/json' \
--data \
'{
"job_name": "8740eace-160a-468e-a2f5-8d2db803f9f6",
"data_source": "wsl_threads",
"query": {
"query_string": "cats OR dogs"
},
"job_type": "periodic",
"schedule": "0 0 0/12 1/1 * ? *",
"max_documents": 50
}'
For more details on creating data collection jobs, see Job Management.
Additional Details
Stats
| Searchable Records | Update Frequency | Partner Type |
|---|---|---|
| 45 million (3 months) | Near-time (Max 10-minute latency) | Stream Integrated |
Compatible Metadata Fields
| Applicable Metadata Categories | Compatible |
|---|---|
| Source | Yes |
| Content | Yes |
| Author | Yes |
| Person | No |
| Enrichment | Yes |
| Organization | No |
| Data source-specific fields? | Yes, please see the Metadata page. |
Compatible Classifiers & Models
| Classifier & Model | Compatible |
|---|---|
| Named Entity Recognition | No |
| Location_Inference | Yes |
| Language | Yes |
| Reported_Violence | No |
| Sentiment | No |
| Hard_News | No |
Compatible Features
As a Stream-Integrated partner, all streaming features are available.
| Features | Compatible |
|---|---|
| Search API | Yes |
| Date Histograms | Yes |
| Term Aggregations | Yes |
| Highlighting | Yes |
| Fuzzy and Proximate Search | Ye |
Available Fields
The available fields can be changed by the data provider (WebSightLines), but currently this is the possible field names.
Common fields (present in most entries):
iddoc_datedata_source(e.g.,"wsl_threads")source.link(URL to the post)author.nameauthor.bioauthor.profile_image_sourceauthor.urlauthor.handleauthor.verified(boolean as string, e.g.,"False")content.body(post text)content.publishedcontent.found(timestamp when post was scraped)content.found_by(e.g.,"wsl_threads_profile_robot")content.last_updatedcontent.likes_countcontent.followers(author's follower count)internal.provider_document_idinternal.last_updatedinternal.annotations(array of objects withname/value, e.g.,"found_with: profile: calfirelnu")internal.destinations(array, e.g.,["public"])
Content Metadata:
content.hashtags(array of hashtags, e.g.,["dogs", "fyp"])content.mentions(array of mentioned handles, e.g.,["lostdogrescue"])content.images(array of objects withurland optionalalternative_text)content.video_urls(array of URLs, if post contains video)
Threads-Specific fields:
threads.content_type(e.g.,"TEXT","IMAGE","VIDEO","CAROUSEL")threads.post_type(e.g.,"POST","REPLY")threads.post_identifier(unique ID for the post)threads.user_verified(boolean as string)threads.user_idthreads.source_link(for replies, links to parent post)
Enrichment Fields:
enrichment.language(e.g.,"en","es","nl")enrichment.location_inference_country.label(e.g.,"US","CA")enrichment.location_inference_country.confidence(float, e.g.,0.8323)
Built-In Language Detection
WSL_Threads has by default a built-in language enrichment provided by WebSightLine, the languages currently supported are:
| ISO 639-1 | Language |
|---|---|
| AR | Arabic |
| BG | Bulgarian |
| CS | Czech |
| DA | Danish |
| DE | German |
| EL | Greek |
| EN | English |
| ES | Spanish |
| ET | Estonian |
| FA | Persian |
| FI | Finnish |
| FR | French |
| HE | Hebrew |
| HI | Hindi |
| HR | Croatian |
| HU | Hungarian |
| ID | Indonesian |
| IT | Italian |
| JA | Japanese |
| KO | Korean |
| MS | Malay |
| NL | Dutch |
| NO | Norwegian |
| PL | Polish |
| PT | Portuguese |
| RO | Romanian |
| RU | Russian |
| SL | Slovenian |
| SV | Swedish |
| TH | Thai |
| TR | Turkish |
| UK | Ukrainian |
| VI | Vietnamese |
| ZH | Chinese |
| U | Undefined |
Updated 6 months ago
