Get Started

Data Volume Units

A unifying model for multiple different data consumption metrics.

What is a Data Volume Unit (DVU)

The biggest challenge in comparing and estimating consumption costs from unstructured and semi-structured sources lies in the varied methods of estimation. Many of Datastreamer's customers are using multiple components in their pipelines that traditionally have very varied pricing models.

Datastreamer works with their integrated partners to best align their unique pricing methodology with a common Data Volume Unit (DVU) methodology so that customers can easily assess, estimate, and use a unified pricing and measurement approach.

Rule of Thumb: 1 DVU of any support metric is estimated to align to the size, complexity, and effort to process 100 Twitter posts.

Data Volume Units in Billing

For Integrated billing, the Data Volume Units (DVUs) are counted using the metric of that component, and converted in a direct conversion format to the next DVU. Estimation, billing, and pricing tables are built off the DVUs.

For example: If 200 kilobytes is equal to 1 DVU of a component, and 850 kilobytes were processed in a pipeline, 5 DVUs would be present in billing.

Supported Metrics

Datastreamer supports the following metrics in conversion to DVUs.

MetricDescriptionSupported
Tokens (General)Commonly used in AI products, Tokens are a measurement of elements in text data.Yes
Input TokensSome components and AI products separately measure Tokens used in the input of data.Yes
Output TokensSome components and AI products separately measure Tokens used in the output of data.Yes
Documents per SecondCount of documents being processed per second. Present in some firehose data sources.Yes
BytesMeasurement of the size of a documents.Yes
Compute Time (milliseconds, hours)Measurements of the computational resources used to process the information. Common with analysis and some NLP products.Yes
Document Count (results, post count)Count of documents. Most common in data sources that are not performing ad-hoc data collection.Yes
RequestsCount of documents. Most common in data sources that are performing ad-hoc data collection.Yes
Mentions, CreditsSome providers use a seperate credit or mention system. This is custom per provider and source.Yes
CSV RowsRows of a CSV documents.Yes
Field CountCount of the number of fields in returned data. Often used in conjunctions with other metrics.Yes
CharactersCount of the characters used.Yes
WordsCount of the words used.Yes
PDF PagesCount of the pages of a PDF document, as extracted from the metadata of the document. In the case of alternate page sizes, a US Letter sizing is treated as default.Yes

Common Conversions

This is an illustrative table of basic conversions. Many sources will have more specific details.

Rule of Thumb: 1 DVU of any support metric is estimated to align to the size, complexity, and effort to process 100 Twitter posts.

MetricMeasurement of MetricDVU CountDetails
Characters28,0001Based on English language.
Words6,0001
CSV Rows1001
Bytes100KB1
Tokens6,0001
Documents (Social)1001Based on short-form social content.
PDF Page11

Component Specific Conversions

Datastreamer supports many third party components and integrations. This helpful table provides an approximate conversion rate from the 3rd parties pricing metric to 1 Data Volume Unit (DVU).

Please note that table is approximate.

ComponentAmount of 3rd party metric = 1 DVU3rd Party Pricing Metric
WebSightLine Instagram100Documents
WebSightLine Threads100Documents
WebSightLine File Fetcher200Kilobytes
Data365 Data Sources120Mentions
PrivateAI PII Redaction28,000Characters
AI Classifiers (OpenAI)7,000 Input tokens, 2,000 Output TokensTokens (Various)
AI Classifiers (Gemini)7,000 Input tokens, 2,000 Output TokensTokens (Various)
CleanDNS Data Sources100Documents
Dark Owl Search APIs5Documents
Socialgist Data Sources100Documents
Average Custom Data Ingestion200Kilobytes
Location Inference Classifier100Documents
PDF Data Integration1Page
Datastreamer NLP Classifers (Bundle)100Documents
PDF Table Conversion1Table
Bright Data Specialty Source (Bundle)20Records
Bright Data High Result Sources (Bundle)150Records
Vetric High Request Sources (Bundle)5Requests
Vetric Low Request Sources (Bundle)10Requests
Vetric Detailed Request Sources (Bundle)1Requests
Twingly VK165Documents
Twingly Blogs70Documents
Opoint News60Documents
Vital4 Adverse Media50Documents
Vital4 Politically Exposed Persons670Documents
Vital4 Watchlist35Documents
Vital4 Crime530Documents
Cohere SentimentxTokens
ChatGPT Prompt ApplicationxTokens