Data Volume Units

A unifying model for multiple different data consumption metrics.

What is a Data Volume Unit (DVU)

The biggest challenge in comparing and estimating consumption costs from unstructured and semi-structured sources lies in the varied methods of estimation. Many of Datastreamer's customers are using multiple components in their pipelines that traditionally have very varied pricing models.

Datastreamer works with their integrated partners to best align their unique pricing methodology with a common Data Volume Unit (DVU) methodology so that customers can easily assess, estimate, and use a unified pricing and measurement approach.

Rule of Thumb: 1 DVU of any support metric is estimated to align to the size, complexity, and effort to process 100 Twitter posts.

Data Volume Units in Billing

For Integrated billing, the Data Volume Units (DVUs) are counted using the metric of that component, and converted in a direct conversion format to the next DVU. Estimation, billing, and pricing tables are built off the DVUs.

For example: If 800 bytes is equal to 1 DVU of a component, and 8,500 bytes were processed in a pipeline, 11 DVUs would be present in billing.

Supported Metrics

Datastreamer supports the following metrics in conversion to DVUs.

MetricDescriptionSupported
Tokens (General)Commonly used in AI products, Tokens are a measurement of elements in text data.Yes
Input TokensSome components and AI products separately measure Tokens used in the input of data.Yes
Output TokensSome components and AI products separately measure Tokens used in the output of data.Yes
Documents per SecondCount of documents being processed per second. Present in some firehose data sources.Yes
BytesMeasurement of the size of a documents.Yes
Compute Time (milliseconds, hours)Measurements of the computational resources used to process the information. Common with analysis and some NLP products.Yes
Document Count (results, post count)Count of documents. Most common in data sources that are not performing ad-hoc data collection.Yes
RequestsCount of documents. Most common in data sources that are performing ad-hoc data collection.Yes
Mentions, CreditsSome providers use a seperate credit or mention system. This is custom per provider and source.Yes
CSV RowsRows of a CSV documents.Yes
Field CountCount of the number of fields in returned data. Often used in conjunctions with other metrics.Yes
CharactersCount of the characters used.Yes
WordsCount of the words used.Yes
PDF PagesCount of the pages of a PDF document, as extracted from the metadata of the document. In the case of alternate page sizes, a US Letter sizing is treated as default.Yes

Common Conversions

This is an illustrative table of basic conversions. Many sources will have more specific details.

Rule of Thumb: 1 DVU of any support metric is estimated to align to the size, complexity, and effort to process 100 Twitter posts.

MetricMeasurement of MetricDVU CountDetails
Characters28,0001Based on English language.
Words6,0001
CSV Rows1001
Bytes100KB1
Tokens6,0001
Documents (Social)1001Based on short-form social content.
PDF Page11

Component Specific Conversions

The details information for specific components is not available in this section.