Data Volume Units
A unifying model for multiple different data consumption metrics.
What is a Data Volume Unit (DVU)
The biggest challenge in comparing and estimating consumption costs from unstructured and semi-structured sources lies in the varied methods of estimation. Many of Datastreamer's customers are using multiple components in their pipelines that traditionally have very varied pricing models.
Datastreamer works with their integrated partners to best align their unique pricing methodology with a common Data Volume Unit (DVU) methodology so that customers can easily assess, estimate, and use a unified pricing and measurement approach.
Rule of Thumb: 1 DVU of any support metric is estimated to align to the size, complexity, and effort to process 100 Twitter posts.
Data Volume Units in Billing
For Integrated billing, the Data Volume Units (DVUs) are counted using the metric of that component, and converted in a direct conversion format to the next DVU. Estimation, billing, and pricing tables are built off the DVUs.
For example: If 800 bytes is equal to 1 DVU of a component, and 8,500 bytes were processed in a pipeline, 11 DVUs would be present in billing.
Supported Metrics
Datastreamer supports the following metrics in conversion to DVUs.
Metric | Description | Supported |
---|---|---|
Tokens (General) | Commonly used in AI products, Tokens are a measurement of elements in text data. | Yes |
Input Tokens | Some components and AI products separately measure Tokens used in the input of data. | Yes |
Output Tokens | Some components and AI products separately measure Tokens used in the output of data. | Yes |
Documents per Second | Count of documents being processed per second. Present in some firehose data sources. | Yes |
Bytes | Measurement of the size of a documents. | Yes |
Compute Time (milliseconds, hours) | Measurements of the computational resources used to process the information. Common with analysis and some NLP products. | Yes |
Document Count (results, post count) | Count of documents. Most common in data sources that are not performing ad-hoc data collection. | Yes |
Requests | Count of documents. Most common in data sources that are performing ad-hoc data collection. | Yes |
Mentions, Credits | Some providers use a seperate credit or mention system. This is custom per provider and source. | Yes |
CSV Rows | Rows of a CSV documents. | Yes |
Field Count | Count of the number of fields in returned data. Often used in conjunctions with other metrics. | Yes |
Characters | Count of the characters used. | Yes |
Words | Count of the words used. | Yes |
PDF Pages | Count of the pages of a PDF document, as extracted from the metadata of the document. In the case of alternate page sizes, a US Letter sizing is treated as default. | Yes |
Common Conversions
This is an illustrative table of basic conversions. Many sources will have more specific details.
Rule of Thumb: 1 DVU of any support metric is estimated to align to the size, complexity, and effort to process 100 Twitter posts.
Metric | Measurement of Metric | DVU Count | Details |
---|---|---|---|
Characters | 28,000 | 1 | Based on English language. |
Words | 6,000 | 1 | |
CSV Rows | 100 | 1 | |
Bytes | 100KB | 1 | |
Tokens | 6,000 | 1 | |
Documents (Social) | 100 | 1 | Based on short-form social content. |
PDF Page | 1 | 1 |
Component Specific Conversions
The details information for specific components is not available in this section.
Updated about 1 month ago