Release Notes

A log of all changes to the Datastreamer platform and the versions that are related to each new addition.

This represents a log of all changes to the API and the versions that are related to each new addition.

📘

Change Notification

Datastreamer notifies its customers about all API changes through communication from their dedicated account manager team, and in our Datastreamer Community Slack Group.

Our updates are versioned according to the taxonomy of Semantic Versioning 2.0.0, and are written as a version number of MAJOR.MINOR.PATCH.

We increment our code version based on the following:

MAJOR Version: When we make API changes incompatible with previous versions.
MINOR Version: When we add functionality in a backward-compatible manner.
PATCH Version: When we make backward-compatible bug fixes or the addition of minor fields.

VersionDate ReleasedNotes
6.20, 6.21, 6.22September 20, 2024✨New & Improved

- Data Collection Jobs Jobs is a brand new feature and capability that has been slowly rolled out since early 2024. It has been designed to allow our customers and users to be able to easily set collection requirements on a periodic or one-time manner via API or Portal. These Jobs utilize the Dynamic Pipelines' abilities to convert, manage, control, and schedule data collection requirements across over 50 data different data sources. With this formal release, the following capabilities are available:

- Create a Job (available via API and Portal) More Info

- View details of, and list Jobs (available via API and Portal) More Info

- Update and modify Jobs (available via Portal) More Info.

- Additional Egress Options: Google Cloud Storage Egress, as well as Snowflake Egress have been added as available Egresses for the platform.

- Vetric Partnership and Sources: Vetric is now an integrated partner of Datastreamer, adding over 30 new data sources and endpoints for X (Twitter), Facebook, Linkedin, and Instagram.

- LLM-powered Classifiers: Utilizing Datastreamer's proprietary LLM wrapper, over 20+ classifiers have been added into the Datastreamer platform catalog. Users are able to apply LLM-powered classifiers directly into the Pipeline, and operate them at 5-10% of the cost of using standard LLMs. The Datastreamer LLM wrapper handles realtime pre-processing, content pruning, prompt application, micro-batching, output quality testing, and post-processing. Allowing the classifiers to operate without the standard LLM issues.

- Product Sentiment Classifier:The Product Sentiment Classifier extracts brands from short text and provides sentiment (positive, negative, or neutral) for each brand, along with a reason. It supports multiple languages, making it adaptable for global markets, and can operate in real-time or batch-processing scenarios. More Info

- WebSightLine File Fetcher: This new feature powered by WebSightLine downloads media into client storage automatically from JSON paths. The File Fetcher is often used to pull the images, files, HTTP pages, and other content associated with documents in a Pipeline. It will pull the files from the provided JSON paths using a powerful proxy network and place the files in the specified storage. More Info

- Socialgist TikTok We have added Socialgist Tiktok to available integration data sources. This data source is able to provide real-time collection and regular delivery of TikTok data into your pipelines.

- Pipeline Inspector: The Inspector UI allows you to test and observe the pipeline by providing the ability to fetch documents directly in the browser. Upon deploying the Inspector into your pipeline, a cache of any new documents flowing through the pipeline will be preserved. The Inspector can also be used within the Job system inside the Portal, allowing you to create a Job and view the results from different stages in the Pipeline as they are received and processed. This is often used in the testing, analysis, and trialing of different pipeline components and data sources. More Info

🛠 Fixes & Updates

- Location Inference Updates: Conversational models for English content have additional country inference capabilities for inferring the location of the posting. The following countries have been added: NZ (New Zealand), BR (brazil), CO (Colombia), PR(Puerto Rico), TR (Turkey), TH (Thailand). In addition, Arabic-language has been added, with the ability to detect Egypt and Saudi Arabia. More Info

- JSON Document Router Updates The "JSON Document Router" has been upgraded to have many more routing capabilities than before. After adding this component, you have the abilities to apply filters on routes, utilize operators in route logic, multi-route, preview the routes on the pipeline viewer in Portal, and even use the new "Builder" capability for easier creation and management of routes. More Info

💙 Early Access & Misc- "Data Volume Unit" Updates: The DVU is a common metric to allow ease of estimation, billing, measurement, and performance across the system. Eleven (11) different metrics (as of release) are used by our different partners, and we have updated our platform logic as a result. As more components are added, the need to have a simple conversion to a common metric has increased.

Datastreamer now supports measuring in the following metrics, and conversion to DVUs: tokens (general), input tokens, output tokens, documents, documents per second, bytes, compute time - miliseconds, compute time - seconds, requires, mentions, credit, csv rows, field count, page count, table count, characters, and words. More Info
6.19Jun 14, 2024✨New & Improved

- Dynamic Pipeline Manager We have released many more and new capabilities to manage your pipeline. Within the Pipeline Builder on portal.datastreamer.io, there is more customization and elements to deploy, manage, start, stop, iterate, and create a pipeline. A lot more details area available at: https://docs.datastreamer.io/docs/saving-and-deploying-a-dynamic-pipeline.

- Pipeline Versioning Within the Pipeline Manager, you can now deploy new versions of an existing pipeline, and the Datastreamer platform will track the versions. Giving greater flexibility to trying new platform capabilities and testing.

- Regional Pipeline Deployments When a Pipeline is deployed within the Datastreamer Platform, the Platform brings all the required elements online in the default region (United States). With Regional Deployment, the components are brought online in the required region.

- Pipeline Analytics and Component Cards We have also added more information surrounding the Component cards and Pipeline analytics in the platform. When you view a Pipeline, you can see greater insight into the performance of individual components. You can also see forecasted spend on the Billing page. More information here: https://docs.datastreamer.io/docs/pipeline-analytics

- Socialgist Dynamic Pipeline Components Brand new dynamic pipeline components are now available! You can create collection jobs right in the Pipeline. These jobs both running on a recurring basis that you can set. Timeframes are selectable from hourly to every 7 days. Upon the moment of the Job being live, they will wait the defined timeframe before running again and searching for data since last run. You can read more about it here: https://docs.datastreamer.io/docs/socialgist-jobs-api-ui

🛠 Fixes & Updates

- Documentation Updates We have added additional documentation around Pipeline management, Job creation, and Dynamic Pipeline information. More updates to come.

- Hashing JSON transform component has received an update to allow you to apply hashing to a field.
6.18May 31, 2024 ✨New & Improved

- Dynamic Pipeline Editor You can now create, view, edit, and manage you Pipelines directly from the new visual editor. Along with a host of coming features, the Pipeline editor allows you faster releases, expansions, and troubleshooting. Drag-and-drop desired components into your Pipeline view. Configure the components, and Deploy with a single button. Datastreamer will handle everything else in seconds. 5 minutes to build a whole new pipeline.

- Location Inference Updates The Location Inference models have been adapted to work with Conversational and Broadcast-style content. Expanding the amount of sources covered by 15x. In addition, expansion in coverage for the Spanish language models were released.

- CleanDNS Partnership We are proud to announce that CleanDNS has chosen Datastreamer to deliver and make available their DNS data on our partner network. Leading coverage of traffic and data around known malicious domains.

- Google Commit Usage Do you know that you could use your excess Google Cloud Commit for Datastreamer's platform and select Partner components/sources? You can now!

- Additional Ingress New components to support Ingress to your Pipelines from Elastic, Pubsub, and Google Cloud Storage. More to come!

🛠 Fixes & Updates

- Documentation Updates We are actively working to "catch up" on all the capabilities released and update our technical and website documentation. Non-stop changes coming to you!

- Other Fixes: We have released fixes to Location Inference operations, Extraction API, Pipeline health monitoring, and more.

💙 Early Access & Misc

- Google Gemini Integration Datastreamer's team is integrating Google Gemini for use cases involving advanced document structuring use cases. Do you have a complex use case? We'd love to hear about it.

- Adhoc Usage of Pipelines Tasks and recurring queries feeding pipelines that go into your databases and feed are perfect. What if you want to see the content in the pipeline live? What if you want to trigger via API the addition of new adhoc requests? We are working on this now and would love your input.
6.17April 30, 2024 ✨New & Improved:

- Improved Billing Insights Brand new dashboards can show your commit, platform tier, component usage, and more. In addition to seeing your current platform usage, you can also see spend projections. Accessible at portal.datastreamer.io.

- Dynamic Pipeline Viewer We have released over 100 components, that you can use to solve any data structuring need. All these powerful pipelines can come together in a fully dynamic and self-managing Dynamic Pipeline. Existing dynamic pipelines are visible at https://portal.datastreamer.io/pipelines.

- Google Marketplace Launch We are now on Google Marketplace! If you are using Google Cloud as a hosting provider for your platform, this may have financial and procurement benefits for you.

- Socialgist Partnership We are proud to announce Socialgist as integration partners. Socialgist brings world-leading coverage of blogs, forums, news, video sites, and reviews.

- Brightdata Partnership Brightdata's new scraping APIs can be used within Datastreamer. Utilize Datastreamer pipelines to initiate custom scraping jobs of major social, web, and ecommerce platforms.

- Location Inference Updates Location Inference classifiers for Spanish and English social media have been expand. Spanish location inference can now infer posts originating in: Spanish, Mexico, Colombia, Spain, Chile, Peru and Argentina. In addition, English content received new countries inference options of: New Zealand, France, Germany, Mexico, and India.

- Dominant Location Classifier A challenge with News data is often narrowing down target locations discussed. The new dominant location classifier uses entity and pattern recognition to detect the dominant location within News articles.

🛠 Fixes & Updates

- Documentation Updates With the addition of so many capabilities to the Datastreamer platform, our documentation has been refreshed under new taxonomy to support changes and future planned expansion. Available at docs.datastreamer.io.💙 Early Access & Misc- Google Gemini Integration Datastreamer's team is integrating Google Gemini for use cases involving advanced document structuring use cases. Do you have a complex use case? We'd love to hear about it.

- Location Inference for TikTok data sources Interested in adding location inference to your TikTok data, or looking for Tiktok components? These capabilities are in early access now!
-Dynamic Pipeline Builder Looking for a 5-minute way to build complex data pipelines? Datastreamer's team is building a edit/create functionality into our Dynamic Pipeline Viewer and would love your input!
6.16February 23, 2024✨New & Improved:

📤 A wealth of new Egress components have been launched!

- Azure egress component: Send data into your Azure environment with this new component.

- BigQuery egress component: Feed data in BigQuery, AnalyticsHub, and other Google locations.

- Databricks egress component Deliver streaming data or any dynamic pipeline into Databricks landing zones.

- Azure File Mover Move files around Azure enviroments, triggered by Datastreamer pipeline triggers.

- Google Cloud Storage egress component: Egress a platform pipeline directly into Google cloud storage.📁 Do much more with PDFs!

- PDF table detection component Detect tables within PDFs as a trigger.

- PDF OCR component Convert non-digital PDFs into JSON content.

- PDF table to schema component Extract and content the tables within PDFs into JSON content.🛠 Fixes & Updates

- Malay language covrage for wsl_instagram and wsl_threads

- Socialgist sample sources: You can now freely explore Socialgist data in the Datastreamer platform.
6.15January 23, 2024✨New & Improved:

- Socialgist has now joined Datastreamer as an integrated partner! You can use Socialgist's coverage of Video sites, Blogs, Forums, News, and Review sources. You can check out the Platform Catalog and Socialgist partner profile for more information.🛠 Fixes & Updates

- Location inference has now been updated for the Japanese location inference for Instagram and Thread sources.💙 Early Access & Misc- We are working on updates to our billing system and also our platform observability system. We'd love to hear your insights and feedback! If this is something interesting for you, let us know and we'll connect.
6.14.2January 15, 2024✨New & Improved:

- Location inference has been added to the metadata for Opoint News. Allowing further filtering by country.

- content.location: Location tags (which are author-provided and free text) are now available in wsl_instagram content with the field: X

- instagram.user_id: In addition, wsl_instagram has received a user_ID field that is the Meta ID common across all Meta properties.

- Lastly, wsl_instagram is now able to detect the Turkish language.💙 Early Access & Misc

- Japanese location inference expansion is in early access and will be released in the next few days.
6.14.1December 8, 2023✨New & Improved:

- Location Inference has been expanded to classify content in English for the following countries: Brazil (BR), Columbia (CO), Turkey (TR), and Thailand (TH).

- A new processing layer for document content with OCR and Conversions from Tables to JSON is in early release. Perfect for PDFs, Images, and other forms of media.🛠 Fixes & Updates

- Documentation improvements and fixes.

- UnifyAI prompts improved.

- Some streaming data sources are now using Datastreamer's new Dynamic Pipelines technology. Allowing for more rapid integration, higher flexibility, and great power. Stay tuned for updates on this expansion!💙 Early Access & Misc- Blogs, Forums, Chinese media, and more coming with a new partnership to be announced next week! 👀
6.14November 20, 2023✨New & Improved:

- UnifyAI has now been released in a feedback-oriented Alpha version! UnifyAI is a Large Language Model(LLM)-Powered agent that automatically generates consistent metadata fields for unstructured text content. Lots more technical details here: (link). You can also check out the dedicated website page for UnifyAI (link).You can try out UnifyAI Playground on Portal (link).We are looking forward to feedback, hear use cases, and gather more insight to augment our roadmap for the development of UnifyAI. During the following weeks, we will be adding to UnifyAI heavily.🛠 Fixes & Updates

- Documentation improvements and fixes.

- Faster processing of Data365 tasks.💙 Early Access & Misc

- UnifyAI is such a big release that we couldn't put it in this section for 6.14! Stay tuned for many new updates in data providers, UnifyAI's growth, and more.
6.13November 10, 2023✨New & Improved:

- New Sandbox accounts can now request temporary increases to accounts for further exploration.

- Major behind-the-scenes upgrades for the upcoming 6.14 release.

- Datastreamer is now part of the Google for Startups Cloud Program.🛠 Fixes & Updates

- Fixes to the processing of large Extractions

- Improvements in task management and data ingestion for Data365 adapters.

- Improvements in Websightline adapter speed and coverage.

- Improvements to Portal account administration.💙 Early Access & Misc

- 6.14 may be one of our most cutting-edge releases in 2023. Stay tuned to an announcement coming soon!
6.12.6October 23, 2023✨New & Improved:

- Location Inference is now applied to wsl_Threads content by default.🛠 Fixes & Updates

- We have improved the duplication detection in the Data365 adapter.

- A number of fixes and improvements to Portal have also been released to improve the user experience.
6.12.5October 13, 2023✨New & Improved:

- Country location inference now supports Spanish for Instagram sources (specifically identifying Mexico and Columbia)!

- WebSightLine adapter for Threads data is out of beta.🛠 Fixes & Updates

- Updates to the Task Engine to fix some tasks that may fail due to higher concurrent ingestions.

- Updates to pricing and packaging tier laying the groundwork for updates to the portal usage dashboards.

- Fixes released to address a bug that prevented Operations from successfully being applied to Extraction API.

- The account creation process in Portal has been simplified.💙 Early Access & Misc

- Are you interested in using AI to populate needed metadata fields using the content from within the document? We would love to hear your use case! UnifyAI: coming late 2023.
6.12.4September 29, 2023🛠 Fixes & Updates

- Updates to the Pipeline Builder within Portal to create an add Data365 tasks

- Streamlined Portal account creation.

- Additions to Location Inference models provide enhanced speed and additional country coverage.💙 Early Access & Misc

- Do you use Google Big Query and/or work with large amount of PDFs/CSVs? We'd love to talk some items on our roadmap. Please reach out to your account manager if you are interested.
6.12.3September 12, 2023✨New & Improved:

- A new task engine has been released for sources that use collection tasks to populate data sources. This task engine brings a number of fixes to the Data365 adapters and will be expanded to other task-based adapters in later iterations.

- The location inference classifier for data365_twitter and Twitter official APIs can now locate posts from Thailand, Turkey, Puerto Rico in addition to above 26 countries from the city location inference model.🛠 Fixes & Updates

- Updates to data365_twitter now populates the post type metadata fields

- Fixes to the darkowl_search adapter solves issues where queries would time out.

- Multiple fixes for the Data365 adapter task management have been released.ℹ️ Additional Information on the Task Engine Update Fixes:- Duplicate content that may have been generated by overlapping tasks is reduced in frequency.

- Multiple tasks that may have caused slow collection times or backlogs can now run concurrently.

- Data collection, especially for larger tasks, will now run more efficiently and faster.

- Fixed task syncing issues, which may have caused some tasks to not be started.

- The documented process to create tasks is not changed, all updates are performed behind the scenes within the platform.
6.12.2August 25, 2023✨New & Improved:

- Location Inference classifier has received a major update, adding Puerto Rico, more English coverage, and faster updates.

- Many operations and enrichments have received updates to provide compatibility to both Twitter official API adapters and Data365 adapters.

- WebSightLine has released wsl_threads, providing full coverage of the new Threads' network.

- WebSightLine will be retiring wsl_twitter due to new changes preventing their ongoing data offering.🛠 Fixes & Updates

- Multiple fixes for the Data365 adapter have been released.
6.12.1August 11, 2023✨New & Improved:

- Emoji sentiment classifier has now been released! This brand-new classifier from the Datastreamer team detects, analyzes, and applies a sentiment tag to social media content containing emojis.🛠 Fixes & Updates

- New metadata fields have been released for the upcoming adapter for wsl_threads

- Location inference has been expanded to also detect Japanese location inference in the Japanese language.

- The Pipeline Builder within Portal has received multiple improvements: easier application of operations, user experience improvements, and more.

- Improvements to file handling have been released.💙 Early Access & Misc- Portal dashboards will not yet be using the new pricing or your commit. This only applies to August usage and is being worked on and targeted for a later release.
6.12August 1-3, 2023✨New & Improved:

- The first release of the Pipeline Builder (Beta) is now available in Portal. Use the Builder to create and experiment with various pipeline configurations and see the output.

- Emoji sentiment classifier is now live and running on select sources. Use this to filter the sentiment of content based on the emojis used within the content. Also available to be used as a post-processing operation.

- You can now upload PDF and JSON files directly into the Pipeline platform for faster data integration.

- New pricing and packaging has been released. Thank you all for your patience and support.

- You can now connect Twitter official APIs directly to Datastreamer. Reach out to your account team for early access!🛠 Fixes & Updates

- New metadata fields have been added to the Product metadata category.

- Fixes to the Twingly adapters have been released to adapt to the new changes.💙 Early Access & Misc- Portal dashboards will not yet be using the new pricing or your commit. This only applies to August usage and is being worked on and targeted for a later release.

- Interested in Threads, Amazon data, or integrating Twitter (X) official APIs? We'd love to see you in the early access! Just reach out to your account manager.
6.11July 5, 2023✨New & Improved:

- OpenAI prompts (ChatGPT) can now be run on any data sources through your pipelines.

- Data365 is now available as a data partner.

- You can now apply Operations to the Extraction API

- Sample sources are now available for Data365 and WebSightLine data sources.🛠 Fixes & Updates- Better removal of anchor text in Unify

- Fixes to Image URL identification and linking content to other posts in social media threads in Unify.

- Additional metadata fields available in the Datastreamer schema have been added in preparation for upcoming new sources.

- Validation has been added to the Monitored Search API endpoint
6.10June 2, 2023✨New & Improved:

- ESG Classifier: This new classifier is specially trained to apply ESG labels to News content, but can be used for other long-form content as well. The ESG classifier is a 4-value label. The label value is one of ‘environmental’, ‘social’, or ‘governance’ depending on the ESG topic; or ‘none’ if no ESG topic is detected in the post.🛠 Fixes & Updates

- Spanish location inference has been expanded for Twitter data sources.

- A new metadata field for "cashtags" has been added.💙 Early Access & Misc- We will be at Collision 2023 in Toronto. Meet us there!
6.9May 31, 20236.9 brings new Data partner catalog, beta partners, and many more changes!

✨New & Improved:

- Platform Catalog: An updated platform catalog is now available at: https://datastreamer.io/platform-catalog/. This new catalog provides greater information on sources.

- Sample Sources: Sample sources allow you to use a sample of a premium data source for free. Allowing you to explore, test, and try out a data source free of cost. WebSightLine sources are our launch partner, however, additional partner sources are coming soon.🛠 Fixes & Updates

- Update: Schema updates provide better social media data matching. Impacting specifically Twitter retweet, reply, and quote fields.

- Count API rate limit has expanded.

- Our status page (status.datastreamer.io) has pre-component subscriptions added. Allowing you to only receive notifications on the items you care about.💙 Early Access & Misc- Data365 sources are now available in early access. Reach out to an account manager if you wish to try these sources.
6.8.4May 17, 20236.8.4 bring Qualify of Life (QOL) updates to a number of areas:

🛠 Fixes & Updates

- Updates to metadata: releases of Offenses and Product categories for the inclusion of additional data partners.

- Release of new v6 -> v5 firehose adapter
- Fixed issue where Monitored Search may not deliver all matched content in high-volume queries.
- Fixed issue where Location Inference may be missing from some data sources.
- Fixed issue where the creation of new data sources would auto-assign a name with no ability to modify.
6.8.3March 20, 20236.8.3 brings Quality of Life updates to:

🛠 Fixes & Updates

- Count API: Count API now returns data faster and with clearer error messages.

- Portal: You can now view previous months in Portal's billing and dashboard section.
- Location Inference: The Location Inference enrichment model is now able to predict more cities for the United States.
- Partners: DarkOwl groups have been added as filters
- Partners: Connecting compatible data sources has received more supporting documentation.
- Partners: CoHere integration is now available.
6.8.2February 7, 2023🛠 Fixes & Updates

6.8.2 brings multiple bug fixes and optimizations for partner sources.
6.8.1January 23, 2023✨New & Improved:

Update 6.8.1 brings the release of Datastreamer's own Content Similarity Clustering. It allows you to cluster (group) similar content together from a query. Try it with news content today!
6.8 AmurJanuary 9, 2023The 6.8. Amur release brings a wealth of new functionality into the Datastreamer platform:

✨New & Improved:

- Count API: This new API endpoint allows you to view the total number of matching search results for a query.
- Country Inference: Utilize this new classifier to infer the location of a content post. Specially built for social content.🛠 Fixes & Updates- Runaway query warning: This update for monitored search provides a notice to the user if making a high-volume monitored search.
6.7October 17, 2022Run a wealth of classifiers, models and operations on any data travelling through the pipeline. Enrich, expand, and combine.

✨New & Improved:

- House your own operations inside the Datastreamer pipeline

- Available launch operations:Detect language
Google Translate
Concat
Map
Hard news
Intent
Category
Sentiment
Private AI - PII Redaction
6.6September 1, 2022✨New & Improved:

Release of Operations API
6.3March 22, 2022✨New & Improved:

Launch of Datastreamer Data Partner Network with flagship partner: Opoint Media, and relevant metadata fields.

Release of Named Entity Recognition (NER) and Hard News classifiers, and relevant metadata fields.

Addition of doc_date.
6.2.1February 22, 2022✨New & Improved:

Addition of Highlighting functionality.
6.2January 5, 2022✨New & Improved:

Our Location Inference classifier and Aggregation API Endpoints have been released, and the following fields were added to the API as a result:

- enrichment.location_inference.label

- enrichement.location_inference.confidence
Up to 6.22021🛠 Fixes & Updates

Summarizing the changes to 6.0.X and 6.1.X in 2021

The following fields were added to the API as additional metadata information:
_content.favorites
_content.followers
_content.following

The following fields were added in preparation for the violence classifier release.
enrichment.reported_violence.label
enrichment.reportedviolence.confidence

The following fields were modified for greater clarity.
twitter.tweettype
twitter.retweettype

The following fields were deprecated:
_ enrichment.spam_probability

The following fields were added to the API as additional metadata information:
instagram.contenttype
content.mentions

The Violence Classifier was released.

An update to our metadata has introduced a new field of: author.bio_links