📘 Platform Glossary

Term	Definition	Example
Adapter Partner	A type of Datastreamer partner that has worked with the Platform to offer an official integration of their capabilities as a Component.	Webz Partnership
Boolean	A simple query syntax using "and", "or", and other operands.	cat OR dog
Collation	Data Collation is the process of gathering, accumulating and organizing data from different sources into a coherent dataset.	Google Cloud Storage Egress
Data Volume Unit	There is very little standardization in the units of pricing used in the world of data. A "Data Volume Unit" or DVU is a unifying model for multiple different data consumption and enrichment metrics. It makes it easy to add more capabilities to the platform without having to use calculus.	See the section: Pipeline & Cost Management
Egress	The other end of the pipeline from Ingress, Egress is the outgoing of the pipeline, generally into a database, data lake, searchable storage, or other pipeline.	Databricks Egress
ETL	Extract, Transform, Load (ETL) is a three-phase computing process where data is extracted from an input source, transformed and loaded into an output.	Fivetran Egress
Enrichments	A type of Operation that can be performed to data in a pipeline, an Enrichment is often an NLP or LLM-based technology that analyze the data and adds additional metadata.	Sentiment Model
Ingestion	Similar to Ingress, Ingestion refers to the action of Ingress	See Ingress
Ingress	The input to a pipeline. Ingress refers to entering into the pipeline: generally from a data source, data storage, or other pipeline.	Twitter (X) Ingress
Integrated Partner	A type of Datastreamer partner that has worked with the Platform to offer an official integration of their capabilities as a Component. In addition, an Integrated partner has connected their billing systems to Datastreamer's systems for seamless billing and procurement.	Socialgist Partnership
Jobs	Jobs are small tasks that run within the Jobs portion of the platform. They are used to pull data from various sources into the platform through Ingress components. Jobs are able to self-manage, schedule, and adapt to changes.	See the heading "Jobs"
Lucene	Similar to Boolean, Lucene is a powerful query syntax that is used by many data sources. It allows higher levels of complexity than Boolean.	cat~1 AND NOT dog
Operations	One of the main categories of Components, Operations allow you to perform actions on the data. These could include enrichments, routing components, data augmentations, translations, deduplications, and more. Operations can take a simple in-out pipeline and turn them into huge value creators of the source data.	ChatGPT Prompt Execution
Pipeline Orchestration Platform	This a solution designed to manage, coordinate, and streamline the flow of data across multiple sources, processing stages, and destinations. In simpler terms, it’s a system that helps companies control the entire journey of their data, from collection to final output, allowing for flexible integration and transformation at each stage.	Datastreamer's platform
Routing	A type of Operation to direct data based on different criteria. There are multiple components in the platform that can move data based on content, metadata, filters, and more.	JSON Metadata Routing Component
Schema	Schema is a the structure of the content. While Unstructured and Semi-structured data often lack schemas, components like Unify are used to structure the data into a common format. That "format" is called the Schema. Datastreamer offers a common schema as a starting point.	See Unify
Transformer	A component or process that converts data from one format to another, such as transforming raw data into the Unify Schema format.	Unify Transformer
Unify	A unique component within the Platform. "Unify". is a combination of capabilities that work to understand the existing state of the data and convert to a common schema. It is a one-click solution for structuring many data sources.	Unify Component
Unify Schema	A standardized data format used across Datastreamer to ensure consistency and compatibility between different data sources.	See Schema