Terminology
A glossary of terms that you may encounter in this documentation and the overall platform.
Term | Definition | Example |
---|---|---|
Pipeline Orchestration Platform | This a solution designed to manage, coordinate, and streamline the flow of data across multiple sources, processing stages, and destinations. In simpler terms, it’s a system that helps companies control the entire journey of their data, from collection to final output, allowing for flexible integration and transformation at each stage. | Datastreamer's platform |
Ingress | The input to a pipeline. Ingress refers to entering into the pipeline: generally from a data source, data storage, or other pipeline. | Twitter (X) Ingress |
Egress | The other end of the pipeline from Ingress, Egress is the outgoing of the pipeline, generally into a database, data lake, searchable storage, or other pipeline. | Databricks Egress |
Enrichments | A type of Operation that can be performed to data in a pipeline, an Enrichment is often an NLP or LLM-based technology that analyze the data and adds additional metadata. | Sentiment Model |
Routing | A type of Operation to direct data based on different criteria. There are multiple components in the platform that can move data based on content, metadata, filters, and more. | JSON Metadata Routing Component |
Lucene | Similar to Boolean, Lucene is a powerful query syntax that is used by many data sources. It allows higher levels of complexity than Boolean. | cat~1 AND NOT dog |
Operations | One of the main categories of Components, Operations allow you to perform actions on the data. These could include enrichments, routing components, data augmentations, translations, deduplications, and more. Operations can take a simple in-out pipeline and turn them into huge value creators of the source data. | ChatGPT Prompt Execution |
Boolean | A simple query syntax using "and", "or", and other operands. | cat OR dog |
Unify | A unique component within the Platform. "Unify". is a combination of capabilities that work to understand the existing state of the data and convert to a common schema. It is a one-click solution for structuring many data sources. | Unify Component |
Ingestion | Similar to Ingress, Ingestion refers to the action of Ingress | See Ingress |
Schema | Schema is a the structure of the content. While Unstructured and Semi-structured data often lack schemas, components like Unify are used to structure the data into a common format. That "format" is called the Schema. Datastreamer offers a common schema as a starting point. | See Unify |
Jobs | Jobs are small tasks that run within the Jobs portion of the platform. They are used to pull data from various sources into the platform through Ingress components. Jobs are able to self-manage, schedule, and adapt to changes. | See the heading "Jobs" |
Data Volume Unit | There is very little standardization in the units of pricing used in the world of data. A "Data Volume Unit" or DVU is a unifying model for multiple different data consumption and enrichment metrics. It makes it easy to add more capabilities to the platform without having to use calculus. | See the section: Pipeline & Cost Management |
Adapter Partner | A type of Datastreamer partner that has worked with the Platform to offer an official integration of their capabilities as a Component. | Webz Partnership |
Integrated Partner | A type of Datastreamer partner that has worked with the Platform to offer an official integration of their capabilities as a Component. In addition, an Integrated partner has connected their billing systems to Datastreamer's systems for seamless billing and procurement. | Socialgist Partnership |
Updated about 1 month ago