Platform Glossary
A reference for terms used in this documentation and across the Datastreamer platform.
| Term | Definition | Example |
|---|---|---|
| Adapter Partner | A Datastreamer partner that has integrated their capabilities as a Component within the platform. | Webz Partnership |
| Boolean | A query syntax using "AND", "OR", "NOT", and similar operators. | cat OR dog |
| Collation | The process of gathering and organizing data from multiple sources into a single dataset. | Combining results from multiple sources |
| Commit | See Committed Usage Discount. | |
| Committed Usage Discount | A pre-purchased volume of DVUs for a billing cycle, offered at a discounted per-DVU rate. Commits are not caps on usage. | See Pricing and Billing section |
| Data Stream | The core unit of work in Datastreamer. A Data Stream combines sources, transformation, enrichment, pipeline logic, and a destination into a single configured workflow. It is not a single component but the combination of all components working together. | A stream collecting Twitter/X, applying sentiment, and delivering to BigQuery |
| Data Volume Unit (DVU) | The standard unit of measurement for all platform usage. DVUs provide a single metric across sources, enrichments, and pipeline operations. For Data Streams: 1 DVU per Job Run, 1 DVU per 100 content documents (first 100 per run included). Enrichments and Direct Integrations have per-component DVU rates. | See the DVU documentation |
| Destination | The output of a Data Stream or pipeline, where processed data is delivered. Destinations include data warehouses, cloud storage, streaming endpoints, and Datastreamer Searchable Storage. Previously referred to as Egress. | BigQuery, Amazon S3, Webhook |
| Direct Integration | A connector that gives direct access to a specific third-party provider or cloud storage system. Requires provider credentials and uses per-component DVU pricing. Used when specific provider access, custom credentials, or provider-specific features are required. | Socialgist, Brightdata, Amazon S3 |
| Egress | The output stage of a pipeline. See Destination. | |
| Enrichment | An operation applied to data in a pipeline that adds structured metadata. Enrichments typically use AI or NLP models. Examples include sentiment analysis, entity recognition, and categorization. | Sentiment Classifier, Named Entity Recognition |
| ETL | Extract, Transform, Load. A data processing pattern where data is extracted from a source, transformed into a target format, and loaded into a destination. | Fivetran Egress |
| Ingress | The input stage of a pipeline, where data enters from a source or storage system. In a Data Stream, ingress is handled automatically via sources and Jobs. | Twitter/X Source, Amazon S3 Ingress |
| Integrated Partner | A Datastreamer partner whose billing is connected to the platform, enabling unified billing and procurement for their components. | Socialgist Partnership |
| Job | The mechanism that executes data collection within a Data Stream. A Job runs a query against a source, retrieves content, and passes it into the pipeline. Jobs handle scheduling, retries, volume limits, and source failover automatically. | A Job querying Twitter/X for a keyword |
| Lucene | A query syntax that supports more complex expressions than Boolean, including proximity, fuzzy matching, and field-specific queries. | cat~1 AND NOT dog |
| Operation | A category of component that performs an action on data as it flows through a pipeline. Operations include enrichments, routing, transformation, deduplication, translation, and more. | JSON Router, Sentiment Classifier |
| Pipeline | The logic and routing that defines how data moves through a Data Stream, from source through transformation and enrichment to destination. The pipeline is one component of a Data Stream, not the product itself. | See Data Stream |
| Pipeline Orchestration Platform | A system for managing the end-to-end flow of data across sources, processing stages, and destinations. Datastreamer is a pipeline orchestration platform. | Datastreamer |
| Routing | A type of Operation that directs data to different paths based on content, metadata, or filter criteria. | JSON Document Router |
| Schema | The structure of a data document. Datastreamer provides a unified schema as a common output format across sources. | See Unify |
| Source | A data input for a Data Stream. Sources include social media platforms, news, and web content. In a Data Stream, sources are managed automatically; in Direct Integrations, sources are configured per provider. | Twitter/X, Reddit, Facebook |
| Transformer | A component that converts data from one format or structure to another. | JSON Schema Transformer |
| Unify | A platform component that converts raw source data into Datastreamer's unified schema. It normalizes varied source formats into a consistent structure for downstream use. | Unify Component |
| Unify Schema | Datastreamer's standardized output format. Provides a consistent field structure across all supported sources, making it possible to query and route data regardless of origin. | See Schema |
