Sources Overview

Sources are the data inputs for a Data Stream. When you configure a source in a Data Stream, Datastreamer handles provider selection and data retrieval automatically.

Sources Overview

Sources are the data inputs for a Data Stream. When you configure a source in a Data Stream, Datastreamer handles provider selection and data retrieval automatically. You define what you want; the platform determines how to get it.


How Sources Work

Datastreamer maintains connections to a curated set of market-leading data providers for each supported platform. When a Job runs, the platform selects the provider best suited to your query based on availability, coverage, and reliability. If a provider is unavailable, the Job is automatically routed to an alternative.

This means:

  • You do not need to manage provider accounts or API credentials for standard sources
  • Data collection continues if a provider experiences an outage
  • Provider selection is updated automatically as the provider landscape changes

The underlying providers are not exposed to you by default. What you interact with is the source configuration: the platform, query, filters, and date range.


Supported Sources

Data Streams support the following platforms:

PlatformDocumentation
FacebookFacebook Source
InstagramInstagram Source
Twitter/XTwitter/X Source
TikTokTikTok Source
RedditReddit Source
YouTubeYouTube Source
ThreadsThreads Source
BlueskyBluesky Source

Additional platforms are added continuously.


Configuring a Source

Each source is configured through a Job. The Job defines:

  • The platform to query
  • The search query or account to monitor (keyword, hashtag, URL, account handle, etc.)
  • Date range and recency filters
  • Volume limits (maximum documents per run)
  • Schedule (how often the Job runs)

Each platform has its own set of available filters. See the individual source pages above for details.


Multiple Sources in One Data Stream

A Data Stream can include multiple sources. For example, you can collect from Twitter/X and Reddit within the same stream, with the data flowing through shared transformation and enrichment stages before reaching the destination.

Each source runs as its own Job. DVU usage accumulates across all Jobs in the stream.


Provider Selection

Datastreamer selects providers based on your query requirements. The more specific your query (narrow date ranges, platform-specific filters, etc.), the fewer providers may be eligible. In most cases, multiple providers are available and the selection is transparent to you.

Datastreamer does not expose which provider is serving a given Job by default.


Need a Specific Provider?

If you need to use a specific data provider directly, whether to use your own API credentials, access provider-specific parameters, or meet a procurement requirement, that is available through Direct Integrations.

Direct Integrations use a different pricing model. See Direct Integrations Pricing for details.


Related