WebSightLine Threads

WebSightLine (WSL) Threads is a high-sampling of near-time public Threads content.

The WebSightLine Threads component provides a live feed with millions of Threads posts and comments each day.

New to Datastreamer? Start here.

👍

Unify Schema

This data source already use Unify Schema.

How to use?

The WebSightLine Threads is powered by the Jobs System, when interacting with the component you have the option to define your jobs queries.

Search Queries

Filters

Available filters for WebSightLine Threads can be found in the table below:

Filter NameDescription
queryList of keywords or a phrase to search
max_documentsSet a limit for the number of posts that will be fetched for the search.

The Lucene Query is supported for this component in the query field. Here are some of the basics queries that you can try:

Keywords:

cats

Fields:

title:lucene

Phrases:

"apache lucene"

Wildcards:

tes\*

Boolean operators:

cats OR dogs

Examples

Search for cats or dogs

Query cats or dogs every 6 hours:

You also have the option to use the API. You can use the Code button to extract this example:

curl --location 'https://dev.api.platform.datastreamer.io/api/pipelines/{PIPELINE_ID}/components/{COMPONENT_ID}/jobs?ready=true' \
      --header 'apikey: <your-api-key>' \
      --header 'Content-Type: application/json' \
      --data \
        '{
          "job_name": "8740eace-160a-468e-a2f5-8d2db803f9f6",
          "data_source": "wsl_threads",
          "query": {
            "query_string": "cats OR dogs"
          },
          "job_type": "periodic",
          "schedule": "0 0 0/12 1/1 * ? *",
          "max_documents": 50
        }'

For more details on creating data collection jobs, see Job Management.

Additional Details

Stats

Searchable RecordsUpdate FrequencyPartner Type
45 million (3 months)Near-time (Max 10-minute latency)Stream Integrated

Compatible Metadata Fields

Applicable Metadata CategoriesCompatible
SourceYes
ContentYes
AuthorYes
PersonNo
EnrichmentYes
OrganizationNo
Data source-specific fields?Yes, please see the Metadata page.

Compatible Classifiers & Models

Classifier & ModelCompatible
Named Entity RecognitionNo
Location_InferenceYes
LanguageYes
Reported_ViolenceNo
SentimentNo
Hard_NewsNo

Compatible Features

As a Stream-Integrated partner, all streaming features are available.

FeaturesCompatible
Search APIYes
Date HistogramsYes
Term AggregationsYes
HighlightingYes
Fuzzy and Proximate SearchYe

Available Fields

The available fields can be changed by the data provider (WebSightLines), but currently this is the possible field names.

Common fields (present in most entries):

  • id
  • doc_date
  • data_source (e.g., "wsl_threads")
  • source.link (URL to the post)
  • author.name
  • author.bio
  • author.profile_image_source
  • author.url
  • author.handle
  • author.verified (boolean as string, e.g., "False")
  • content.body (post text)
  • content.published
  • content.found (timestamp when post was scraped)
  • content.found_by (e.g., "wsl_threads_profile_robot")
  • content.last_updated
  • content.likes_count
  • content.followers (author's follower count)
  • internal.provider_document_id
  • internal.last_updated
  • internal.annotations (array of objects with name/value, e.g., "found_with: profile: calfirelnu")
  • internal.destinations (array, e.g., ["public"])

Content Metadata:

  • content.hashtags (array of hashtags, e.g., ["dogs", "fyp"])
  • content.mentions (array of mentioned handles, e.g., ["lostdogrescue"])
  • content.images (array of objects with url and optional alternative_text)
  • content.video_urls (array of URLs, if post contains video)

Threads-Specific fields:

  • threads.content_type (e.g., "TEXT", "IMAGE", "VIDEO", "CAROUSEL")
  • threads.post_type (e.g., "POST", "REPLY")
  • threads.post_identifier (unique ID for the post)
  • threads.user_verified (boolean as string)
  • threads.user_id
  • threads.source_link (for replies, links to parent post)

Enrichment Fields:

  • enrichment.language (e.g., "en", "es", "nl")
  • enrichment.location_inference_country.label (e.g., "US", "CA")
  • enrichment.location_inference_country.confidence (float, e.g., 0.8323)

Built-In Language Detection

WSL_Threads has by default a built-in language enrichment provided by WebSightLine, the languages currently supported are:

ISO 639-1Language
ARArabic
BGBulgarian
CSCzech
DADanish
DEGerman
ELGreek
ENEnglish
ESSpanish
ETEstonian
FAPersian
FIFinnish
FRFrench
HEHebrew
HIHindi
HRCroatian
HUHungarian
IDIndonesian
ITItalian
JAJapanese
KOKorean
MSMalay
NLDutch
NONorwegian
PLPolish
PTPortuguese
RORomanian
RURussian
SLSlovenian
SVSwedish
THThai
TRTurkish
UKUkrainian
VIVietnamese
ZHChinese
UUndefined