Data Streams Overview

This section covers everything involved in building and running a Data Stream: sources, transformation, enrichment, and destinations.

If you are new to the platform, start with What is a Data Stream? for an introduction to the core concepts.

In This Section

Sources: Configure data inputs for your Data Stream
Transformation and Enrichment: Shape data and apply AI/NLP operations
Destinations: Deliver processed data to warehouses, storage, or endpoints

Components of a Data Stream

Sources

Sources define where data comes from. Datastreamer connects to major social media platforms, news sources, and web content automatically. You configure what data you want (keywords, accounts, date ranges, etc.) and the platform handles provider selection and retrieval.

Supported sources include Facebook, Instagram, Twitter/X, TikTok, Reddit, YouTube, Threads, and Bluesky, among others.

Sources Overview

Transformation

Raw data from sources arrives in varied formats. Transformation converts it into a consistent structure using Datastreamer's unified schema, making it compatible with downstream enrichments and destinations.

The Unify Transformer handles this automatically for supported sources. Custom transformation is also available via the JSON Schema Transformer.

Enrichment

Enrichments are optional AI and NLP operations applied to data as it flows through the pipeline. They add structured metadata to each document without replacing the source content. Enrichments are billed at per-component DVU rates, in addition to your base Data Stream usage.

Available enrichments include sentiment analysis, entity recognition, categorization, language detection, location inference, and more.

Operations and Enrichments Overview

Pipeline Logic

The pipeline defines the path data takes from sources to destinations. It supports routing, filtering, deduplication, batching, and branching. Complex logic can be built from the same component set as a simple linear pipeline.

Destinations

Destinations are where processed data is delivered. Supported destinations include cloud data warehouses (BigQuery, Snowflake, Databricks), cloud storage (S3, Azure Blob, GCS), streaming endpoints (Pub/Sub, Firehose, Webhook), and Datastreamer's own Searchable Storage.

Destinations Overview

Jobs

Jobs are how a Data Stream collects data from sources. Each Job runs a query, retrieves content, and passes it into the pipeline. Jobs handle scheduling, retries, volume limits, and source failover automatically.

Creating and Managing Jobs

Pricing

Data Streams are priced in DVUs (Data Volume Units). DVU usage accumulates across all Jobs and components in a stream.

How Data Streams are Priced

Need More Control?

If you need to connect to a specific data provider directly, use your own API credentials, or access provider-specific features, see Direct Integrations.