📘
Documentation Accuracy
We try to ensure every component documentation is as accurate as possible. However as we do not manage 3rd parties product features and documentation, some 3rd party documentation may not be accurate at the time of the last update. Thank you for your understanding.

The WebSightLine Instagram component offers search capabilities within a repository containing millions of Instagram posts and comments from the past two years.

New to Datastreamer? Start here.

👍
Unify Schema
This data source already use Unify Schema.

How to use?

The WebSightLine Instagram is powered by the Jobs System, when interacting with the component you have the option to define your jobs queries.

Search Queries

Filters

Available filters for WebSightLine Instagram can be found in the table below:

Filter Name	Description
query	List of keywords or a phrase to search
max_documents	Set a limit for the number of posts that will be fetched for the search.

The Lucene Query is supported for this component in the query field. Here are some of the basics queries that you can try:

Keywords:

cats

Fields:

title:lucene

Phrases:

"apache lucene"

Wildcards:

tes\*

Boolean operators:

cats OR dogs

Examples

Search for cats or dogs

Query cats or dogs every 6 hours:

You also have the option to use the API. You can use the Code button to extract this example:

curl --location 'https://dev.api.platform.datastreamer.io/api/pipelines/{PIPELINE_ID}/components/{COMPONENT_ID}/jobs?ready=true' \
      --header 'apikey: <your-api-key>' \
      --header 'Content-Type: application/json' \
      --data \
        '{
          "job_name": "57e02e42-9dd9-4390-97ed-40002ae672af",
          "component_name": "wsl-instagram-ingress",
          "data_source": "wsl_instagram",
          "query": {
            "query_string": "cats OR dogs"
          },
          "job_type": "periodic",
          "schedule": "0 0 0/6 1/1 * ? *"
        }'

For more details on creating data collection jobs, see Job Management.

Additional Details

The available fields can be changed by the data provider (WebSightLines), but currently this is the possible field names.

Basic document information:

id
doc_date
data_source
source.link

Internal Fields:

internal.provider_document_id
internal.last_updated
internal.annotations[].name
internal.annotations[].value
internal.destinations[]

Content Fields:

content.body
content.location
content.image_urls[]
content.images[].url
content.images[].alternative_text
content.found
content.found_by
content.published
content.hashtags[]
content.favorites
content.followers
content.following
content.mentions[]
content.last_updated
content.likes_count
content.comments_count

Author Information:

author.name
author.bio
author.profile_image_source
author.gender
author.url
author.handle
author.verified
author.is_business_account

Enrichment Data:

enrichment.language
enrichment.location_inference_country.label
enrichment.location_inference_country.confidence
enrichment.sentiment
enrichment.emoji_sentiment

Instagram-specific Fields:

instagram.content_type
instagram.user_id

These field names represent the complete structure of the Instagram post data in the documents. When using Lucene for searching, you would reference these fields with their full paths (e.g., content.body, author.name, etc.).