Data Streams Overview (The Platform)
What is a Data Stream?
A Data Stream is the core unit of work in Datastreamer. It defines how data moves from sources through transformation and enrichment to a destination.
A Data Stream is not a single component. It is the combination of:
- Sources: where data comes from (social media platforms, news, web content, etc.)
- Transformation: how raw source data is shaped into a consistent, usable format
- Enrichment: optional AI and NLP operations applied to the data (sentiment, entity recognition, categorization, etc.). Enrichments are billed separately at per-component DVU rates.
- Pipeline: the logic connecting these stages, including routing, filtering, and branching
- Destination: where the processed data is delivered (data warehouse, cloud storage, streaming endpoint, etc.)
These are not separate products. They are the components that make up a Data Stream.
What Datastreamer Does
Datastreamer provides the infrastructure to build, deploy, and run Data Streams. It handles the operational complexity of connecting to data sources, processing data at scale, and delivering it reliably to destinations.
Datastreamer is not a data provider, but rather the end-to-end ingestion, processing, and orchestration platform. Data sources are one input to a Data Stream, not the product itself.
How Data Streams Work
When you build a Data Stream, you configure the sources, transformation logic, any enrichments, and the destination. Once deployed, the Data Stream runs continuously or on a schedule via Jobs.
Jobs are the mechanism that executes data collection within a Data Stream. A Job runs a query against a source, retrieves content, and feeds it into the pipeline. Jobs handle scheduling, retries, and volume limits automatically.
A simple example:
Twitter/X Source --> Transformation --> Sentiment Enrichment --> BigQuery
A more complex Data Stream might pull from multiple sources, apply several enrichments, route content based on language or category, and deliver to multiple destinations simultaneously.
Flexibility
Data Streams support a wide range of configurations:
- Multiple sources in a single stream
- Multiple destinations
- Conditional routing based on content
- Any combination of enrichments
- Custom transformation logic
- Bring-your-own data via cloud storage or direct upload
There is no fixed template. The pipeline within a Data Stream can be assembled from any combination of available components.
What's Next
- Data Streams Overview - Build your first Data Stream
- Sources Overview - How data sources work in a Data Stream
- Jobs - How Jobs execute within a Data Stream
- How Datastreamer is Priced - DVU-based pricing for Data Streams
- Glossary - Platform terminology
