WebSightLine Instagram

The WebSightLine Instagram allows you to search for Instagram posts and/or comments

📘

Documentation Accuracy

We try to ensure every component documentation is as accurate as possible. However as we do not manage 3rd parties product features and documentation, some 3rd party documentation may not be accurate at the time of the last update. Thank you for your understanding.

The WebSightLine Instagram component offers search capabilities within a repository containing millions of Instagram posts and comments from the past two years.

New to Datastreamer? Start here.

👍

Unify Schema

This data source already use Unify Schema.

How to use?

The WebSightLine Instagram is powered by the Jobs System, when interacting with the component you have the option to define your jobs queries.

Search Queries

Filters

Available filters for WebSightLine Instagram can be found in the table below:

Filter NameDescription
queryList of keywords or a phrase to search
max_documentsSet a limit for the number of posts that will be fetched for the search.

The Lucene Query is supported for this component in the query field. Here are some of the basics queries that you can try:

Keywords:

cats

Fields:

title:lucene

Phrases:

"apache lucene"

Wildcards:

tes\*

Boolean operators:

cats OR dogs

Examples

Search for cats or dogs

Query cats or dogs every 6 hours:

You also have the option to use the API. You can use the Code button to extract this example:

curl --location 'https://dev.api.platform.datastreamer.io/api/pipelines/{PIPELINE_ID}/components/{COMPONENT_ID}/jobs?ready=true' \
      --header 'apikey: <your-api-key>' \
      --header 'Content-Type: application/json' \
      --data \
        '{
          "job_name": "57e02e42-9dd9-4390-97ed-40002ae672af",
          "component_name": "wsl-instagram-ingress",
          "data_source": "wsl_instagram",
          "query": {
            "query_string": "cats OR dogs"
          },
          "job_type": "periodic",
          "schedule": "0 0 0/6 1/1 * ? *"
        }'

For more details on creating data collection jobs, see Job Management.

Additional Details

The available fields can be changed by the data provider (WebSightLines), but currently this is the possible field names.

Basic document information:

  • id
  • doc_date
  • data_source
  • source.link

Internal Fields:

  • internal.provider_document_id
  • internal.last_updated
  • internal.annotations[].name
  • internal.annotations[].value
  • internal.destinations[]

Content Fields:

  • content.body
  • content.location
  • content.image_urls[]
  • content.images[].url
  • content.images[].alternative_text
  • content.found
  • content.found_by
  • content.published
  • content.hashtags[]
  • content.favorites
  • content.followers
  • content.following
  • content.mentions[]
  • content.last_updated
  • content.likes_count
  • content.comments_count

Author Information:

  • author.name
  • author.bio
  • author.profile_image_source
  • author.gender
  • author.url
  • author.handle
  • author.verified
  • author.is_business_account

Enrichment Data:

  • enrichment.language
  • enrichment.location_inference_country.label
  • enrichment.location_inference_country.confidence
  • enrichment.sentiment
  • enrichment.emoji_sentiment

Instagram-specific Fields:

  • instagram.content_type
  • instagram.user_id

These field names represent the complete structure of the Instagram post data in the documents. When using Lucene for searching, you would reference these fields with their full paths (e.g., content.body, author.name, etc.).