Get Started

Terminology

A glossary of terms that you may encounter in this documentation and the overall platform.

TermDefinitionExample
Pipeline Orchestration PlatformThis a solution designed to manage, coordinate, and streamline the flow of data across multiple sources, processing stages, and destinations. In simpler terms, it’s a system that helps companies control the entire journey of their data, from collection to final output, allowing for flexible integration and transformation at each stage.Datastreamer's platform
IngressThe input to a pipeline. Ingress refers to entering into the pipeline: generally from a data source, data storage, or other pipeline.Twitter (X) Ingress
EgressThe other end of the pipeline from Ingress, Egress is the outgoing of the pipeline, generally into a database, data lake, searchable storage, or other pipeline.Databricks Egress
EnrichmentsA type of Operation that can be performed to data in a pipeline, an Enrichment is often an NLP or LLM-based technology that analyze the data and adds additional metadata.Sentiment Model
RoutingA type of Operation to direct data based on different criteria. There are multiple components in the platform that can move data based on content, metadata, filters, and more.JSON Metadata Routing Component
LuceneSimilar to Boolean, Lucene is a powerful query syntax that is used by many data sources. It allows higher levels of complexity than Boolean.cat~1 AND NOT dog
OperationsOne of the main categories of Components, Operations allow you to perform actions on the data. These could include enrichments, routing components, data augmentations, translations, deduplications, and more. Operations can take a simple in-out pipeline and turn them into huge value creators of the source data.ChatGPT Prompt Execution
BooleanA simple query syntax using "and", "or", and other operands.cat OR dog
UnifyA unique component within the Platform. "Unify". is a combination of capabilities that work to understand the existing state of the data and convert to a common schema. It is a one-click solution for structuring many data sources.Unify Component
IngestionSimilar to Ingress, Ingestion refers to the action of IngressSee Ingress
SchemaSchema is a the structure of the content. While Unstructured and Semi-structured data often lack schemas, components like Unify are used to structure the data into a common format. That "format" is called the Schema. Datastreamer offers a common schema as a starting point.See Unify
JobsJobs are small tasks that run within the Jobs portion of the platform. They are used to pull data from various sources into the platform through Ingress components. Jobs are able to self-manage, schedule, and adapt to changes.See the heading "Jobs"
Data Volume UnitThere is very little standardization in the units of pricing used in the world of data. A "Data Volume Unit" or DVU is a unifying model for multiple different data consumption and enrichment metrics. It makes it easy to add more capabilities to the platform without having to use calculus.See the section: Pipeline & Cost Management
Adapter PartnerA type of Datastreamer partner that has worked with the Platform to offer an official integration of their capabilities as a Component.Webz Partnership
Integrated PartnerA type of Datastreamer partner that has worked with the Platform to offer an official integration of their capabilities as a Component. In addition, an Integrated partner has connected their billing systems to Datastreamer's systems for seamless billing and procurement.Socialgist Partnership