Release Notes

A log of all changes to the Datastreamer platform and the versions that are related to each new addition.

This represents a log of all changes to the API and the versions that are related to each new addition.


Change Notification

Datastreamer notifies its customers about all API changes through communication from their dedicated account manager team, and in our Datastreamer Community Slack Group.

Our updates are versioned according to the taxonomy of Semantic Versioning 2.0.0, and are written as a version number of MAJOR.MINOR.PATCH.

We increment our code version based on the following:

MAJOR Version: When we make API changes incompatible with previous versions.
MINOR Version: When we add functionality in a backward-compatible manner.
PATCH Version: When we make backward-compatible bug fixes or the addition of minor fields.

VersionDate ReleasedNotes
6.19Jun 14, 2024✨New & Improved

- Dynamic Pipeline Manager We have released many more and new capabilities to manage your pipeline. Within the Pipeline Builder on, there is more customization and elements to deploy, manage, start, stop, iterate, and create a pipeline. A lot more details area available at:

- Pipeline Versioning Within the Pipeline Manager, you can now deploy new versions of an existing pipeline, and the Datastreamer platform will track the versions. Giving greater flexibility to trying new platform capabilities and testing.

- Regional Pipeline Deployments When a Pipeline is deployed within the Datastreamer Platform, the Platform brings all the required elements online in the default region (United States). With Regional Deployment, the components are brought online in the required region.

- Pipeline Analytics and Component Cards We have also added more information surrounding the Component cards and Pipeline analytics in the platform. When you view a Pipeline, you can see greater insight into the performance of individual components. You can also see forecasted spend on the Billing page. More information here:

- Socialgist Dynamic Pipeline Components Brand new dynamic pipeline components are now available! You can create collection jobs right in the Pipeline. These jobs both running on a recurring basis that you can set. Timeframes are selectable from hourly to every 7 days. Upon the moment of the Job being live, they will wait the defined timeframe before running again and searching for data since last run. You can read more about it here:

🛠 Fixes & Updates

- Documentation Updates We have added additional documentation around Pipeline management, Job creation, and Dynamic Pipeline information. More updates to come.

- Hashing JSON transform component has received an update to allow you to apply hashing to a field.
6.18May 31, 2024 ✨New & Improved

- Dynamic Pipeline Editor You can now create, view, edit, and manage you Pipelines directly from the new visual editor. Along with a host of coming features, the Pipeline editor allows you faster releases, expansions, and troubleshooting. Drag-and-drop desired components into your Pipeline view. Configure the components, and Deploy with a single button. Datastreamer will handle everything else in seconds. 5 minutes to build a whole new pipeline.

- Location Inference Updates The Location Inference models have been adapted to work with Conversational and Broadcast-style content. Expanding the amount of sources covered by 15x. In addition, expansion in coverage for the Spanish language models were released.

- CleanDNS Partnership We are proud to announce that CleanDNS has chosen Datastreamer to deliver and make available their DNS data on our partner network. Leading coverage of traffic and data around known malicious domains.

- Google Commit Usage Do you know that you could use your excess Google Cloud Commit for Datastreamer's platform and select Partner components/sources? You can now!

- Additional Ingress New components to support Ingress to your Pipelines from Elastic, Pubsub, and Google Cloud Storage. More to come!

🛠 Fixes & Updates

- Documentation Updates We are actively working to "catch up" on all the capabilities released and update our technical and website documentation. Non-stop changes coming to you!

- Other Fixes: We have released fixes to Location Inference operations, Extraction API, Pipeline health monitoring, and more.

💙 Early Access & Misc

- Google Gemini Integration Datastreamer's team is integrating Google Gemini for use cases involving advanced document structuring use cases. Do you have a complex use case? We'd love to hear about it.

- Adhoc Usage of Pipelines Tasks and recurring queries feeding pipelines that go into your databases and feed are perfect. What if you want to see the content in the pipeline live? What if you want to trigger via API the addition of new adhoc requests? We are working on this now and would love your input.
6.17April 30, 2024 ✨New & Improved:

- Improved Billing Insights Brand new dashboards can show your commit, platform tier, component usage, and more. In addition to seeing your current platform usage, you can also see spend projections. Accessible at

- Dynamic Pipeline Viewer We have released over 100 components, that you can use to solve any data structuring need. All these powerful pipelines can come together in a fully dynamic and self-managing Dynamic Pipeline. Existing dynamic pipelines are visible at

- Google Marketplace Launch We are now on Google Marketplace! If you are using Google Cloud as a hosting provider for your platform, this may have financial and procurement benefits for you.

- Socialgist Partnership We are proud to announce Socialgist as integration partners. Socialgist brings world-leading coverage of blogs, forums, news, video sites, and reviews.

- Brightdata Partnership Brightdata's new scraping APIs can be used within Datastreamer. Utilize Datastreamer pipelines to initiate custom scraping jobs of major social, web, and ecommerce platforms.

- Location Inference Updates Location Inference classifiers for Spanish and English social media have been expand. Spanish location inference can now infer posts originating in: Spanish, Mexico, Colombia, Spain, Chile, Peru and Argentina. In addition, English content received new countries inference options of: New Zealand, France, Germany, Mexico, and India.

- Dominant Location Classifier A challenge with News data is often narrowing down target locations discussed. The new dominant location classifier uses entity and pattern recognition to detect the dominant location within News articles.

🛠 Fixes & Updates

- Documentation Updates With the addition of so many capabilities to the Datastreamer platform, our documentation has been refreshed under new taxonomy to support changes and future planned expansion. Available at💙 Early Access & Misc- Google Gemini Integration Datastreamer's team is integrating Google Gemini for use cases involving advanced document structuring use cases. Do you have a complex use case? We'd love to hear about it.

- Location Inference for TikTok data sources Interested in adding location inference to your TikTok data, or looking for Tiktok components? These capabilities are in early access now!
-Dynamic Pipeline Builder Looking for a 5-minute way to build complex data pipelines? Datastreamer's team is building a edit/create functionality into our Dynamic Pipeline Viewer and would love your input!
6.16February 23, 2024✨New & Improved:

📤 A wealth of new Egress components have been launched!

- Azure egress component: Send data into your Azure environment with this new component.

- BigQuery egress component: Feed data in BigQuery, AnalyticsHub, and other Google locations.

- Databricks egress component Deliver streaming data or any dynamic pipeline into Databricks landing zones.

- Azure File Mover Move files around Azure enviroments, triggered by Datastreamer pipeline triggers.

- Google Cloud Storage egress component: Egress a platform pipeline directly into Google cloud storage.📁 Do much more with PDFs!

- PDF table detection component Detect tables within PDFs as a trigger.

- PDF OCR component Convert non-digital PDFs into JSON content.

- PDF table to schema component Extract and content the tables within PDFs into JSON content.🛠 Fixes & Updates

- Malay language covrage for wsl_instagram and wsl_threads

- Socialgist sample sources: You can now freely explore Socialgist data in the Datastreamer platform.
6.15January 23, 2024✨New & Improved:

- Socialgist has now joined Datastreamer as an integrated partner! You can use Socialgist's coverage of Video sites, Blogs, Forums, News, and Review sources. You can check out the Platform Catalog and Socialgist partner profile for more information.🛠 Fixes & Updates

- Location inference has now been updated for the Japanese location inference for Instagram and Thread sources.💙 Early Access & Misc- We are working on updates to our billing system and also our platform observability system. We'd love to hear your insights and feedback! If this is something interesting for you, let us know and we'll connect.
6.14.2January 15, 2024✨New & Improved:

- Location inference has been added to the metadata for Opoint News. Allowing further filtering by country.

- content.location: Location tags (which are author-provided and free text) are now available in wsl_instagram content with the field: X

- instagram.user_id: In addition, wsl_instagram has received a user_ID field that is the Meta ID common across all Meta properties.

- Lastly, wsl_instagram is now able to detect the Turkish language.💙 Early Access & Misc

- Japanese location inference expansion is in early access and will be released in the next few days.
6.14.1December 8, 2023✨New & Improved:

- Location Inference has been expanded to classify content in English for the following countries: Brazil (BR), Columbia (CO), Turkey (TR), and Thailand (TH).

- A new processing layer for document content with OCR and Conversions from Tables to JSON is in early release. Perfect for PDFs, Images, and other forms of media.🛠 Fixes & Updates

- Documentation improvements and fixes.

- UnifyAI prompts improved.

- Some streaming data sources are now using Datastreamer's new Dynamic Pipelines technology. Allowing for more rapid integration, higher flexibility, and great power. Stay tuned for updates on this expansion!💙 Early Access & Misc- Blogs, Forums, Chinese media, and more coming with a new partnership to be announced next week! 👀
6.14November 20, 2023✨New & Improved:

- UnifyAI has now been released in a feedback-oriented Alpha version! UnifyAI is a Large Language Model(LLM)-Powered agent that automatically generates consistent metadata fields for unstructured text content. Lots more technical details here: (link). You can also check out the dedicated website page for UnifyAI (link).You can try out UnifyAI Playground on Portal (link).We are looking forward to feedback, hear use cases, and gather more insight to augment our roadmap for the development of UnifyAI. During the following weeks, we will be adding to UnifyAI heavily.🛠 Fixes & Updates

- Documentation improvements and fixes.

- Faster processing of Data365 tasks.💙 Early Access & Misc

- UnifyAI is such a big release that we couldn't put it in this section for 6.14! Stay tuned for many new updates in data providers, UnifyAI's growth, and more.
6.13November 10, 2023✨New & Improved:

- New Sandbox accounts can now request temporary increases to accounts for further exploration.

- Major behind-the-scenes upgrades for the upcoming 6.14 release.

- Datastreamer is now part of the Google for Startups Cloud Program.🛠 Fixes & Updates

- Fixes to the processing of large Extractions

- Improvements in task management and data ingestion for Data365 adapters.

- Improvements in Websightline adapter speed and coverage.

- Improvements to Portal account administration.💙 Early Access & Misc

- 6.14 may be one of our most cutting-edge releases in 2023. Stay tuned to an announcement coming soon!
6.12.6October 23, 2023✨New & Improved:

- Location Inference is now applied to wsl_Threads content by default.🛠 Fixes & Updates

- We have improved the duplication detection in the Data365 adapter.

- A number of fixes and improvements to Portal have also been released to improve the user experience.
6.12.5October 13, 2023✨New & Improved:

- Country location inference now supports Spanish for Instagram sources (specifically identifying Mexico and Columbia)!

- WebSightLine adapter for Threads data is out of beta.🛠 Fixes & Updates

- Updates to the Task Engine to fix some tasks that may fail due to higher concurrent ingestions.

- Updates to pricing and packaging tier laying the groundwork for updates to the portal usage dashboards.

- Fixes released to address a bug that prevented Operations from successfully being applied to Extraction API.

- The account creation process in Portal has been simplified.💙 Early Access & Misc

- Are you interested in using AI to populate needed metadata fields using the content from within the document? We would love to hear your use case! UnifyAI: coming late 2023.
6.12.4September 29, 2023🛠 Fixes & Updates

- Updates to the Pipeline Builder within Portal to create an add Data365 tasks

- Streamlined Portal account creation.

- Additions to Location Inference models provide enhanced speed and additional country coverage.💙 Early Access & Misc

- Do you use Google Big Query and/or work with large amount of PDFs/CSVs? We'd love to talk some items on our roadmap. Please reach out to your account manager if you are interested.
6.12.3September 12, 2023✨New & Improved:

- A new task engine has been released for sources that use collection tasks to populate data sources. This task engine brings a number of fixes to the Data365 adapters and will be expanded to other task-based adapters in later iterations.

- The location inference classifier for data365_twitter and Twitter official APIs can now locate posts from Thailand, Turkey, Puerto Rico in addition to above 26 countries from the city location inference model.🛠 Fixes & Updates

- Updates to data365_twitter now populates the post type metadata fields

- Fixes to the darkowl_search adapter solves issues where queries would time out.

- Multiple fixes for the Data365 adapter task management have been released.ℹ️ Additional Information on the Task Engine Update Fixes:- Duplicate content that may have been generated by overlapping tasks is reduced in frequency.

- Multiple tasks that may have caused slow collection times or backlogs can now run concurrently.

- Data collection, especially for larger tasks, will now run more efficiently and faster.

- Fixed task syncing issues, which may have caused some tasks to not be started.

- The documented process to create tasks is not changed, all updates are performed behind the scenes within the platform.
6.12.2August 25, 2023✨New & Improved:

- Location Inference classifier has received a major update, adding Puerto Rico, more English coverage, and faster updates.

- Many operations and enrichments have received updates to provide compatibility to both Twitter official API adapters and Data365 adapters.

- WebSightLine has released wsl_threads, providing full coverage of the new Threads' network.

- WebSightLine will be retiring wsl_twitter due to new changes preventing their ongoing data offering.🛠 Fixes & Updates

- Multiple fixes for the Data365 adapter have been released.
6.12.1August 11, 2023✨New & Improved:

- Emoji sentiment classifier has now been released! This brand-new classifier from the Datastreamer team detects, analyzes, and applies a sentiment tag to social media content containing emojis.🛠 Fixes & Updates

- New metadata fields have been released for the upcoming adapter for wsl_threads

- Location inference has been expanded to also detect Japanese location inference in the Japanese language.

- The Pipeline Builder within Portal has received multiple improvements: easier application of operations, user experience improvements, and more.

- Improvements to file handling have been released.💙 Early Access & Misc- Portal dashboards will not yet be using the new pricing or your commit. This only applies to August usage and is being worked on and targeted for a later release.
6.12August 1-3, 2023✨New & Improved:

- The first release of the Pipeline Builder (Beta) is now available in Portal. Use the Builder to create and experiment with various pipeline configurations and see the output.

- Emoji sentiment classifier is now live and running on select sources. Use this to filter the sentiment of content based on the emojis used within the content. Also available to be used as a post-processing operation.

- You can now upload PDF and JSON files directly into the Pipeline platform for faster data integration.

- New pricing and packaging has been released. Thank you all for your patience and support.

- You can now connect Twitter official APIs directly to Datastreamer. Reach out to your account team for early access!🛠 Fixes & Updates

- New metadata fields have been added to the Product metadata category.

- Fixes to the Twingly adapters have been released to adapt to the new changes.💙 Early Access & Misc- Portal dashboards will not yet be using the new pricing or your commit. This only applies to August usage and is being worked on and targeted for a later release.

- Interested in Threads, Amazon data, or integrating Twitter (X) official APIs? We'd love to see you in the early access! Just reach out to your account manager.
6.11July 5, 2023✨New & Improved:

- OpenAI prompts (ChatGPT) can now be run on any data sources through your pipelines.

- Data365 is now available as a data partner.

- You can now apply Operations to the Extraction API

- Sample sources are now available for Data365 and WebSightLine data sources.🛠 Fixes & Updates- Better removal of anchor text in Unify

- Fixes to Image URL identification and linking content to other posts in social media threads in Unify.

- Additional metadata fields available in the Datastreamer schema have been added in preparation for upcoming new sources.

- Validation has been added to the Monitored Search API endpoint
6.10June 2, 2023✨New & Improved:

- ESG Classifier: This new classifier is specially trained to apply ESG labels to News content, but can be used for other long-form content as well. The ESG classifier is a 4-value label. The label value is one of ‘environmental’, ‘social’, or ‘governance’ depending on the ESG topic; or ‘none’ if no ESG topic is detected in the post.🛠 Fixes & Updates

- Spanish location inference has been expanded for Twitter data sources.

- A new metadata field for "cashtags" has been added.💙 Early Access & Misc- We will be at Collision 2023 in Toronto. Meet us there!
6.9May 31, 20236.9 brings new Data partner catalog, beta partners, and many more changes!

✨New & Improved:

- Platform Catalog: An updated platform catalog is now available at: This new catalog provides greater information on sources.

- Sample Sources: Sample sources allow you to use a sample of a premium data source for free. Allowing you to explore, test, and try out a data source free of cost. WebSightLine sources are our launch partner, however, additional partner sources are coming soon.🛠 Fixes & Updates

- Update: Schema updates provide better social media data matching. Impacting specifically Twitter retweet, reply, and quote fields.

- Count API rate limit has expanded.

- Our status page ( has pre-component subscriptions added. Allowing you to only receive notifications on the items you care about.💙 Early Access & Misc- Data365 sources are now available in early access. Reach out to an account manager if you wish to try these sources.
6.8.4May 17, 20236.8.4 bring Qualify of Life (QOL) updates to a number of areas:

🛠 Fixes & Updates

- Updates to metadata: releases of Offenses and Product categories for the inclusion of additional data partners.

- Release of new v6 -> v5 firehose adapter
- Fixed issue where Monitored Search may not deliver all matched content in high-volume queries.
- Fixed issue where Location Inference may be missing from some data sources.
- Fixed issue where the creation of new data sources would auto-assign a name with no ability to modify.
6.8.3March 20, 20236.8.3 brings Quality of Life updates to:

🛠 Fixes & Updates

- Count API: Count API now returns data faster and with clearer error messages.

- Portal: You can now view previous months in Portal's billing and dashboard section.
- Location Inference: The Location Inference enrichment model is now able to predict more cities for the United States.
- Partners: DarkOwl groups have been added as filters
- Partners: Connecting compatible data sources has received more supporting documentation.
- Partners: CoHere integration is now available.
6.8.2February 7, 2023🛠 Fixes & Updates

6.8.2 brings multiple bug fixes and optimizations for partner sources.
6.8.1January 23, 2023✨New & Improved:

Update 6.8.1 brings the release of Datastreamer's own Content Similarity Clustering. It allows you to cluster (group) similar content together from a query. Try it with news content today!
6.8 AmurJanuary 9, 2023The 6.8. Amur release brings a wealth of new functionality into the Datastreamer platform:

✨New & Improved:

- Count API: This new API endpoint allows you to view the total number of matching search results for a query.
- Country Inference: Utilize this new classifier to infer the location of a content post. Specially built for social content.🛠 Fixes & Updates- Runaway query warning: This update for monitored search provides a notice to the user if making a high-volume monitored search.
6.7October 17, 2022Run a wealth of classifiers, models and operations on any data travelling through the pipeline. Enrich, expand, and combine.

✨New & Improved:

- House your own operations inside the Datastreamer pipeline

- Available launch operations:Detect language
Google Translate
Hard news
Private AI - PII Redaction
6.6September 1, 2022✨New & Improved:

Release of Operations API
6.3March 22, 2022✨New & Improved:

Launch of Datastreamer Data Partner Network with flagship partner: Opoint Media, and relevant metadata fields.

Release of Named Entity Recognition (NER) and Hard News classifiers, and relevant metadata fields.

Addition of doc_date.
6.2.1February 22, 2022✨New & Improved:

Addition of Highlighting functionality.
6.2January 5, 2022✨New & Improved:

Our Location Inference classifier and Aggregation API Endpoints have been released, and the following fields were added to the API as a result:

- enrichment.location_inference.label

- enrichement.location_inference.confidence
Up to 6.22021🛠 Fixes & Updates

Summarizing the changes to 6.0.X and 6.1.X in 2021

The following fields were added to the API as additional metadata information:

The following fields were added in preparation for the violence classifier release.

The following fields were modified for greater clarity.

The following fields were deprecated:
_ enrichment.spam_probability

The following fields were added to the API as additional metadata information:

The Violence Classifier was released.

An update to our metadata has introduced a new field of: author.bio_links