About Auto Sources

Auto Sources Overview

What are "Auto Sources"?

"Auto" source selection is available for many major data categories and handles the selection and ingestion of web and social data. Auto mode is Datastreamer's intelligent provider selection system.

Similar to OpenRouter's approach for LLM model routing, Auto mode automatically selects the best data provider for your requirements using Datastreamer's Job system. Instead of manually choosing a specific ingress connector, you define what data you need, and Auto determines which provider (or combination of providers) best fulfills that request.

When you select an "Auto" component, the platform automatically selects the premium 3rd party data source provider with the best fit for your query and highest reliability based on current demand.


How Auto Sources Work

Auto Sources operate through the following workflow:

  1. Create a Job - Each data source (e.g., each social media platform) has it's own pipeline component that you interact with via Jobs. These components contain the filters, date ranges, query parameters, and any other constraints needed for your data collection job, and any filters unique to that data source.

  2. Automatic provider selection – The system evaluates the requirements within your Job against available providers for that source and selects the provider(s) that best match your needs. Different providers may have different coverage, speed, reliability, or feature sets.

  3. Job execution – Your Job runs using the selected provider(s), transparently to you.

  4. Built-in resilience – If a selected provider or its data source is unavailable for the Job, the system automatically routes the request to an alternative online provider.


Provider Selection Logic

The provider with Auto Sources that is powering your Job is selected based on your specific requirements. The more detailed your requirements, the more providers may be filtered out.

Can I Customize Provider Selection?

Within "Auto" sources, you cannot manually select which provider to use, that's the point of Auto!

However, requirements-based selection means you have indirect control. By specifying filters, date ranges, data types, and other constraints, your requirements naturally narrow which providers are available.

It's possible to specify requirements so specific that:

  • Only one provider matches your needs
  • No provider can fulfill your exact requirements (in which case Auto mode will indicate this)

We are working to release customization capabilities that allow you to indicate preferences on latency and cost, which will help ensure the best providers are selected for your needs.

When multiple providers match, the system selects based on factors like availability, coverage, functionality, cost, reliability, and integrated billing capabilities.


Multi-Provider Jobs

Auto mode can combine multiple providers within a single Job. For example:

  • Use one provider to collect posts from a social media platform
  • Use a different provider to collect comments on those same posts
  • Route both to the same destination

This flexibility allows you to optimize data collection by using the best provider for each data type or component of your query.


Pricing & Billing

Auto mode pricing is based on documents collected, measured in batches of 100 documents (1 DVU).

Each Job consumes at least one batch from your billing system:

  • Minimum billing: Each Job uses at least 1 DVU, regardless of documents collected.
  • Batch sizing: Pricing is calculated in batches of 100 documents.
  • Cumulative billing: Jobs that collect 101–200 documents consume 2 DVUs; 201–300 documents consume 3 DVUs, and so on

For detailed pricing information, see How Datastreamer is Priced.


Committed Usage Discounts

Committed Usage Discounts (commits) are also available for Auto mode, allowing you to pre-purchase usage at a discounted rate.

Within Auto, Importantly, commits are per source, not per provider. When using Auto mode across multiple social media sources, you can commit usage separately for each source to optimize your costs.

For example:

  • Commit 10k DVUs (1,000,000 documents) for Facebook Auto Sources
  • Commit 50K DVUs (5,000,000 documents) for Instagram Auto Sources
  • Commit 75K DVUs (7,500,000 documents) for Twitter/X Auto Sources

Your commits apply regardless of which provider Auto Sources selects for that source.

For more details, see Committed Usage Discounts.


Resilience & Failover

Auto mode provides built-in resilience for data collection:

  • Automatic failover: "Auto" can detect degraded performance or outages and automatically process your queries through another available provider to solve it.
  • Transparent recovery: If a selected provider or its underlying data source is detected as offline during job execution, the system automatically routes the remaining request to an alternative provider that is online.
  • No manual intervention: This occurs without user action or job failure, ensuring your data pipeline remains reliable and uptime is maximized.

When to use Auto Sources vs Specific Providers?

Why Use Auto Source Selection?

For Pipeline users focused on outcomes, "Auto" source selection simplifies the ingestion of web and social data. Using the Auto Source Selection component for any data source simplifies the process of selecting, testing, maintaining, and migrating between various providers.

Auto sources also ensure greater reliability and quality of data by selecting the right providers for your queries based on their own capabilities and availability, providing the highest uptime and coverage for the least effort.

What If I Need More Control?

Auto mode is designed for flexibility and ease of use. However, if you need detailed, granular control over a specific data source (beyond what Auto mode's automatic provider selection offers) you can use that dedicated connectors for specific sources provided by vendors that offer those sources directly. This allows you to:

  • Specify which provider to use explicitly
  • Use your own provider API keys
  • Access provider-specific features or parameters
  • Optimize for specific use cases that require deep customization

Note: You cannot add your own API keys into the Auto sources. If you need provider-specific credentials or customization, implement the provider directly. Over 100+ 1st and 3rd party data source providers are supported in Datastreamer.


AspectAuto ModeManual Provider Selection
Provider selectionAutomatic based on requirementsYou choose the specific provider
FailoverBuilt-in; routes to online providersManual intervention required
FlexibilityHigh: works across multiple providersHigh: provider-specific features available
Setup timeFast: specify your data needsRequires knowledge of provider specifics
PricingDVU-based (documents)Subject to provider-specific conversion rate to DVUs.

Supported Sources

Auto Sources are available for major social media and web platforms. Additional platforms are being added continuously.


FAQ: Auto Sources

How is usage measured?

Usage is measured in DVUs (Data Volume Units), the same as with any other integrated data source. Each Job consumes at least 1 DVU, with additional DVUs charged in batches of 100 documents. See Data Volume Units for details.

What if I want to use a specific provider?

If you need a more customized experience or have your own provider keys, you can implement any of the providers directly. Over 100+ 1st and 3rd party data source providers are supported by Datastreamer. However, you cannot add your own API keys into the Auto sources, if you need provider-specific credentials, you must use the provider's dedicated connectors.

Can I customize which providers Auto selects?

Within "Auto" sources, the providers themselves cannot be manually customized. However, your requirements-based selection indirectly controls this. The more specific your filters, date ranges, and data criteria, the more providers may be filtered out. We are working to release customization capabilities that allow you to indicate preferences on latency and cost.

What if no providers are available for my requirements?

If your requirements are too specific and no provider can fulfill them, Auto mode will indicate this. You can either relax your requirements or use a specific provider's dedicated connector directly for more granular control.

How does Auto mode ensure reliability?

Auto mode provides automatic failover: if a selected provider becomes offline or degrades, queries are automatically rerouted to another available provider. This ensures maximum uptime and coverage with no manual intervention required.

Are Committed Usage Discounts available for Auto mode?

Yes, Committed Usage Discounts are available for Auto mode. However, commits are per source, not per provider. You purchase commits separately for each data source (e.g., Facebook Auto, Instagram Auto, etc.), and they apply regardless of which provider Auto selects.

Can I use multiple providers in a single Job?

Yes, Auto mode can combine multiple providers within a single Job. For example, you can use one provider to collect posts and another to collect comments on those posts, all within the same Job in the same pipeline.

What Data Source Providers Are Used?

Within each of the Auto Source selection offerings, the providers may adapt and change as the capabilities of each provider may change. Providers are selected based on functionality, cost, reliability, and integrated billing capabilities. Many of the providers are chosen from the Datastreamer Registry: View the Datastreamer Registry.


Related Documentation