Terminology
A glossary of terms that you may encounter in this documentation and the overall platform.
Term | Definition | Example |
---|---|---|
Adapter Partner | A type of Datastreamer partner that has worked with the Platform to offer an official integration of their capabilities as a Component. | Webz Partnership |
Boolean | A simple query syntax using "and", "or", and other operands. | cat OR dog |
Data Volume Unit | There is very little standardization in the units of pricing used in the world of data. A "Data Volume Unit" or DVU is a unifying model for multiple different data consumption and enrichment metrics. It makes it easy to add more capabilities to the platform without having to use calculus. | See the section: Pipeline & Cost Management |
Egress | The other end of the pipeline from Ingress, Egress is the outgoing of the pipeline, generally into a database, data lake, searchable storage, or other pipeline. | Databricks Egress |
Enrichments | A type of Operation that can be performed to data in a pipeline, an Enrichment is often an NLP or LLM-based technology that analyze the data and adds additional metadata. | Sentiment Model |
Ingestion | Similar to Ingress, Ingestion refers to the action of Ingress | See Ingress |
Ingress | The input to a pipeline. Ingress refers to entering into the pipeline: generally from a data source, data storage, or other pipeline. | Twitter (X) Ingress |
Integrated Partner | A type of Datastreamer partner that has worked with the Platform to offer an official integration of their capabilities as a Component. In addition, an Integrated partner has connected their billing systems to Datastreamer's systems for seamless billing and procurement. | Socialgist Partnership |
Jobs | Jobs are small tasks that run within the Jobs portion of the platform. They are used to pull data from various sources into the platform through Ingress components. Jobs are able to self-manage, schedule, and adapt to changes. | See the heading "Jobs" |
Lucene | Similar to Boolean, Lucene is a powerful query syntax that is used by many data sources. It allows higher levels of complexity than Boolean. | cat~1 AND NOT dog |
Operations | One of the main categories of Components, Operations allow you to perform actions on the data. These could include enrichments, routing components, data augmentations, translations, deduplications, and more. Operations can take a simple in-out pipeline and turn them into huge value creators of the source data. | ChatGPT Prompt Execution |
Pipeline Orchestration Platform | This a solution designed to manage, coordinate, and streamline the flow of data across multiple sources, processing stages, and destinations. In simpler terms, it’s a system that helps companies control the entire journey of their data, from collection to final output, allowing for flexible integration and transformation at each stage. | Datastreamer's platform |
Routing | A type of Operation to direct data based on different criteria. There are multiple components in the platform that can move data based on content, metadata, filters, and more. | JSON Metadata Routing Component |
Schema | Schema is a the structure of the content. While Unstructured and Semi-structured data often lack schemas, components like Unify are used to structure the data into a common format. That "format" is called the Schema. Datastreamer offers a common schema as a starting point. | See Unify |
Transformer | A component or process that converts data from one format to another, such as transforming raw data into the Unify Schema format. | Unify Transformer |
Unify | A unique component within the Platform. "Unify". is a combination of capabilities that work to understand the existing state of the data and convert to a common schema. It is a one-click solution for structuring many data sources. | Unify Component |
Unify Schema | A standardized data format used across Datastreamer to ensure consistency and compatibility between different data sources. | See Schema |
Updated 12 days ago