Platform Glossary

A reference for terms used in this documentation and across the Datastreamer platform.


TermDefinitionExample
Adapter PartnerA Datastreamer partner that has integrated their capabilities as a Component within the platform.Webz Partnership
BooleanA query syntax using "AND", "OR", "NOT", and similar operators.cat OR dog
CollationThe process of gathering and organizing data from multiple sources into a single dataset.Combining results from multiple sources
CommitSee Committed Usage Discount.
Committed Usage DiscountA pre-purchased volume of DVUs for a billing cycle, offered at a discounted per-DVU rate. Commits are not caps on usage.See Pricing and Billing section
Data StreamThe core unit of work in Datastreamer. A Data Stream combines sources, transformation, enrichment, pipeline logic, and a destination into a single configured workflow. It is not a single component but the combination of all components working together.A stream collecting Twitter/X, applying sentiment, and delivering to BigQuery
Data Volume Unit (DVU)The standard unit of measurement for all platform usage. DVUs provide a single metric across sources, enrichments, and pipeline operations. For Data Streams: 1 DVU per Job Run, 1 DVU per 100 content documents (first 100 per run included). Enrichments and Direct Integrations have per-component DVU rates.See the DVU documentation
DestinationThe output of a Data Stream or pipeline, where processed data is delivered. Destinations include data warehouses, cloud storage, streaming endpoints, and Datastreamer Searchable Storage. Previously referred to as Egress.BigQuery, Amazon S3, Webhook
Direct IntegrationA connector that gives direct access to a specific third-party provider or cloud storage system. Requires provider credentials and uses per-component DVU pricing. Used when specific provider access, custom credentials, or provider-specific features are required.Socialgist, Brightdata, Amazon S3
EgressThe output stage of a pipeline. See Destination.
EnrichmentAn operation applied to data in a pipeline that adds structured metadata. Enrichments typically use AI or NLP models. Examples include sentiment analysis, entity recognition, and categorization.Sentiment Classifier, Named Entity Recognition
ETLExtract, Transform, Load. A data processing pattern where data is extracted from a source, transformed into a target format, and loaded into a destination.Fivetran Egress
IngressThe input stage of a pipeline, where data enters from a source or storage system. In a Data Stream, ingress is handled automatically via sources and Jobs.Twitter/X Source, Amazon S3 Ingress
Integrated PartnerA Datastreamer partner whose billing is connected to the platform, enabling unified billing and procurement for their components.Socialgist Partnership
JobThe mechanism that executes data collection within a Data Stream. A Job runs a query against a source, retrieves content, and passes it into the pipeline. Jobs handle scheduling, retries, volume limits, and source failover automatically.A Job querying Twitter/X for a keyword
LuceneA query syntax that supports more complex expressions than Boolean, including proximity, fuzzy matching, and field-specific queries.cat~1 AND NOT dog
OperationA category of component that performs an action on data as it flows through a pipeline. Operations include enrichments, routing, transformation, deduplication, translation, and more.JSON Router, Sentiment Classifier
PipelineThe logic and routing that defines how data moves through a Data Stream, from source through transformation and enrichment to destination. The pipeline is one component of a Data Stream, not the product itself.See Data Stream
Pipeline Orchestration PlatformA system for managing the end-to-end flow of data across sources, processing stages, and destinations. Datastreamer is a pipeline orchestration platform.Datastreamer
RoutingA type of Operation that directs data to different paths based on content, metadata, or filter criteria.JSON Document Router
SchemaThe structure of a data document. Datastreamer provides a unified schema as a common output format across sources.See Unify
SourceA data input for a Data Stream. Sources include social media platforms, news, and web content. In a Data Stream, sources are managed automatically; in Direct Integrations, sources are configured per provider.Twitter/X, Reddit, Facebook
TransformerA component that converts data from one format or structure to another.JSON Schema Transformer
UnifyA platform component that converts raw source data into Datastreamer's unified schema. It normalizes varied source formats into a consistent structure for downstream use.Unify Component
Unify SchemaDatastreamer's standardized output format. Provides a consistent field structure across all supported sources, making it possible to query and route data regardless of origin.See Schema