Release Notes

A log of all changes to the Datastreamer platform and the versions that are related to each new addition.

This represents a log of all changes to the API and the versions that are related to each new addition.

๐Ÿ“˜

Change Notification

Datastreamer notifies its customers about all API changes through communication from their dedicated account manager team, and in our Datastreamer Community Slack Group.

Our updates are versioned according to the taxonomy of Semantic Versioning 2.0.0, and are written as a version number of MAJOR.MINOR.PATCH.

We increment our code version based on the following:

MAJOR Version: When we make API changes incompatible with previous versions.
MINOR Version: When we add functionality in a backward-compatible manner.
PATCH Version: When we make backward-compatible bug fixes or the addition of minor fields.

VersionDate ReleasedNotes
6.17April 30, 2024โœจNew & Improved:

- Improved Billing Insights Brand new dashboards can show your commit, platform tier, component usage, and more. In addition to seeing your current platform usage, you can also see spend projections. Accessible at portal.datastreamer.io.
- Dynamic Pipeline Viewer We have released over 100 components, that you can use to solve any data structuring need. All these powerful pipelines can come together in a fully dynamic and self-managing Dynamic Pipeline. Existing dynamic pipelines are visible at https://portal.datastreamer.io/pipelines.
- Google Marketplace Launch We are now on Google Marketplace! If you are using Google Cloud as a hosting provider for your platform, this may have financial and procurement benefits for you.
- Socialgist Partnership We are proud to announce Socialgist as integration partners. Socialgist brings world-leading coverage of blogs, forums, news, video sites, and reviews.
- Brightdata Partnership Brightdata's new scraping APIs can be used within Datastreamer. Utilize Datastreamer pipelines to initiate custom scraping jobs of major social, web, and ecommerce platforms.
- Location Inference Updates Location Inference classifiers for Spanish and English social media have been expand. Spanish location inference can now infer posts originating in: Spanish, Mexico, Colombia, Spain, Chile, Peru and Argentina. In addition, English content received new countries inference options of: New Zealand, France, Germany, Mexico, and India.
- Dominant Location Classifier A challenge with News data is often narrowing down target locations discussed. The new dominant location classifier uses entity and pattern recognition to detect the dominant location within News articles.

๐Ÿ›  Fixes & Updates

- Documentation Updates With the addition of so many capabilities to the Datastreamer platform, our documentation has been refreshed under new taxonomy to support changes and future planned expansion. Available at docs.datastreamer.io.

๐Ÿ’™ Early Access & Misc

- Google Gemini Integration Datastreamer's team is integrating Google Gemini for use cases involving advanced document structuring use cases. Do you have a complex use case? We'd love to hear about it.
- Location Inference for TikTok data sources Interested in adding location inference to your TikTok data, or looking for Tiktok components? These capabilities are in early access now!
-Dynamic Pipeline Builder Looking for a 5-minute way to build complex data pipelines? Datastreamer's team is building a edit/create functionality into our Dynamic Pipeline Viewer and would love your input!
6.16February 23, 2024โœจNew & Improved:

:outbox-tray: A wealth of new Egress components have been launched!
- Azure egress component: Send data into your Azure environment with this new component.
- BigQuery egress component: Feed data in BigQuery, AnalyticsHub, and other Google locations.
- Databricks egress component Deliver streaming data or any dynamic pipeline into Databricks landing zones.
- Azure File Mover Move files around Azure enviroments, triggered by Datastreamer pipeline triggers.
- Google Cloud Storage egress component: Egress a platform pipeline directly into Google cloud storage.

:file-folder: Do much more with PDFs!
- PDF table detection component Detect tables within PDFs as a trigger.
- PDF OCR component Convert non-digital PDFs into JSON content.
- PDF table to schema component Extract and content the tables within PDFs into JSON content.

๐Ÿ›  Fixes & Updates
- Malay language covrage for wsl_instagram and wsl_threads
- Socialgist sample sources: You can now freely explore Socialgist data in the Datastreamer platform.
6.15January 23, 2024โœจNew & Improved:

- Socialgist has now joined Datastreamer as an integrated partner! You can use Socialgist's coverage of Video sites, Blogs, Forums, News, and Review sources. You can check out the Platform Catalog and Socialgist partner profile for more information.

๐Ÿ›  Fixes & Updates

- Location inference has now been updated for the Japanese location inference for Instagram and Thread sources.

๐Ÿ’™ Early Access & Misc

- We are working on updates to our billing system and also our platform observability system. We'd love to hear your insights and feedback! If this is something interesting for you, let us know and we'll connect.
6.14.2January 15, 2024โœจNew & Improved:

- Location inference has been added to the metadata for Opoint News. Allowing further filtering by country.
- content.location: Location tags (which are author-provided and free text) are now available in wsl_instagram content with the field: X
- instagram.user_id: In addition, wsl_instagram has received a user_ID field that is the Meta ID common across all Meta properties.
- Lastly, wsl_instagram is now able to detect the Turkish language.

๐Ÿ’™ Early Access & Misc
- Japanese location inference expansion is in early access and will be released in the next few days.
6.14.1December 8, 2023โœจNew & Improved:

- Location Inference has been expanded to classify content in English for the following countries: Brazil (BR), Columbia (CO), Turkey (TR), and Thailand (TH).
- A new processing layer for document content with OCR and Conversions from Tables to JSON is in early release. Perfect for PDFs, Images, and other forms of media.

๐Ÿ›  Fixes & Updates

- Documentation improvements and fixes.
- UnifyAI prompts improved.
- Some streaming data sources are now using Datastreamer's new Dynamic Pipelines technology. Allowing for more rapid integration, higher flexibility, and great power. Stay tuned for updates on this expansion!

๐Ÿ’™ Early Access & Misc

- Blogs, Forums, Chinese media, and more coming with a new partnership to be announced next week! :eyes:
6.14November 20, 2023โœจNew & Improved:
- UnifyAI has now been released in a feedback-oriented Alpha version! UnifyAI is a Large Language Model(LLM)-Powered agent that automatically generates consistent metadata fields for unstructured text content. Lots more technical details here: (link). You can also check out the dedicated website page for UnifyAI (link).

You can try out UnifyAI Playground on Portal (link).

We are looking forward to feedback, hear use cases, and gather more insight to augment our roadmap for the development of UnifyAI. During the following weeks, we will be adding to UnifyAI heavily.

๐Ÿ›  Fixes & Updates
- Documentation improvements and fixes.
- Faster processing of Data365 tasks.

๐Ÿ’™ Early Access & Misc
- UnifyAI is such a big release that we couldn't put it in this section for 6.14! Stay tuned for many new updates in data providers, UnifyAI's growth, and more.
6.13November 10, 2023โœจNew & Improved:
- New Sandbox accounts can now request temporary increases to accounts for further exploration.
- Major behind-the-scenes upgrades for the upcoming 6.14 release.
- Datastreamer is now part of the Google for Startups Cloud Program.

๐Ÿ›  Fixes & Updates
- Fixes to the processing of large Extractions
- Improvements in task management and data ingestion for Data365 adapters.
- Improvements in Websightline adapter speed and coverage.
- Improvements to Portal account administration.

๐Ÿ’™ Early Access & Misc
- 6.14 may be one of our most cutting-edge releases in 2023. Stay tuned to an announcement coming soon!
6.12.6October 23, 2023โœจNew & Improved:
- Location Inference is now applied to wsl_Threads content by default.

๐Ÿ›  Fixes & Updates
- We have improved the duplication detection in the Data365 adapter.
- A number of fixes and improvements to Portal have also been released to improve the user experience.
6.12.5October 13, 2023โœจNew & Improved:
- Country location inference now supports Spanish for Instagram sources (specifically identifying Mexico and Columbia)!
- WebSightLine adapter for Threads data is out of beta.

๐Ÿ›  Fixes & Updates
- Updates to the Task Engine to fix some tasks that may fail due to higher concurrent ingestions.
- Updates to pricing and packaging tier laying the groundwork for updates to the portal usage dashboards.
- Fixes released to address a bug that prevented Operations from successfully being applied to Extraction API.
- The account creation process in Portal has been simplified.

๐Ÿ’™ Early Access & Misc
- Are you interested in using AI to populate needed metadata fields using the content from within the document? We would love to hear your use case! UnifyAI: coming late 2023.
6.12.4September 29, 2023๐Ÿ›  Fixes & Updates
- Updates to the Pipeline Builder within Portal to create an add Data365 tasks
- Streamlined Portal account creation.
- Additions to Location Inference models provide enhanced speed and additional country coverage.

๐Ÿ’™ Early Access & Misc
- Do you use Google Big Query and/or work with large amount of PDFs/CSVs? We'd love to talk some items on our roadmap. Please reach out to your account manager if you are interested.
6.12.3September 12, 2023โœจNew & Improved:

- A new task engine has been released for sources that use collection tasks to populate data sources. This task engine brings a number of fixes to the Data365 adapters and will be expanded to other task-based adapters in later iterations.
- The location inference classifier for data365_twitter and Twitter official APIs can now locate posts from Thailand, Turkey, Puerto Rico in addition to above 26 countries from the city location inference model.

๐Ÿ›  Fixes & Updates

- Updates to data365_twitter now populates the post type metadata fields
- Fixes to the darkowl_search adapter solves issues where queries would time out.
- Multiple fixes for the Data365 adapter task management have been released.

:information-source: Additional Information on the Task Engine Update Fixes:

- Duplicate content that may have been generated by overlapping tasks is reduced in frequency.
- Multiple tasks that may have caused slow collection times or backlogs can now run concurrently.
- Data collection, especially for larger tasks, will now run more efficiently and faster.
- Fixed task syncing issues, which may have caused some tasks to not be started.
- The documented process to create tasks is not changed, all updates are performed behind the scenes within the platform.
6.12.2August 25, 2023โœจNew & Improved:

- Location Inference classifier has received a major update, adding Puerto Rico, more English coverage, and faster updates.
- Many operations and enrichments have received updates to provide compatibility to both Twitter official API adapters and Data365 adapters.
- WebSightLine has released wsl_threads, providing full coverage of the new Threads' network.
- WebSightLine will be retiring wsl_twitter due to new changes preventing their ongoing data offering.

๐Ÿ›  Fixes & Updates
- Multiple fixes for the Data365 adapter have been released.
6.12.1August 11, 2023โœจNew & Improved:

- Emoji sentiment classifier has now been released! This brand-new classifier from the Datastreamer team detects, analyzes, and applies a sentiment tag to social media content containing emojis.

๐Ÿ›  Fixes & Updates

- New metadata fields have been released for the upcoming adapter for wsl_threads
- Location inference has been expanded to also detect Japanese location inference in the Japanese language.
- The Pipeline Builder within Portal has received multiple improvements: easier application of operations, user experience improvements, and more.
- Improvements to file handling have been released.

๐Ÿ’™ Early Access & Misc

- Portal dashboards will not yet be using the new pricing or your commit. This only applies to August usage and is being worked on and targeted for a later release.
6.12August 1-3, 2023โœจNew & Improved:

- The first release of the Pipeline Builder (Beta) is now available in Portal. Use the Builder to create and experiment with various pipeline configurations and see the output.
- Emoji sentiment classifier is now live and running on select sources. Use this to filter the sentiment of content based on the emojis used within the content. Also available to be used as a post-processing operation.
- You can now upload PDF and JSON files directly into the Pipeline platform for faster data integration.
- New pricing and packaging has been released. Thank you all for your patience and support.
- You can now connect Twitter official APIs directly to Datastreamer. Reach out to your account team for early access!

๐Ÿ›  Fixes & Updates

- New metadata fields have been added to the Product metadata category.
- Fixes to the Twingly adapters have been released to adapt to the new changes.

๐Ÿ’™ Early Access & Misc

- Portal dashboards will not yet be using the new pricing or your commit. This only applies to August usage and is being worked on and targeted for a later release.
- Interested in Threads, Amazon data, or integrating Twitter (X) official APIs? We'd love to see you in the early access! Just reach out to your account manager.
6.11July 5, 2023โœจNew & Improved:

- OpenAI prompts (ChatGPT) can now be run on any data sources through your pipelines.
- Data365 is now available as a data partner.
- You can now apply Operations to the Extraction API
- Sample sources are now available for Data365 and WebSightLine data sources.

๐Ÿ›  Fixes & Updates

- Better removal of anchor text in Unify
- Fixes to Image URL identification and linking content to other posts in social media threads in Unify.
- Additional metadata fields available in the Datastreamer schema have been added in preparation for upcoming new sources.
- Validation has been added to the Monitored Search API endpoint
6.10June 2, 2023โœจNew & Improved:

- ESG Classifier: This new classifier is specially trained to apply ESG labels to News content, but can be used for other long-form content as well. The ESG classifier is a 4-value label. The label value is one of โ€˜environmentalโ€™, โ€˜socialโ€™, or โ€˜governanceโ€™ depending on the ESG topic; or โ€˜noneโ€™ if no ESG topic is detected in the post.

๐Ÿ›  Fixes & Updates

- Spanish location inference has been expanded for Twitter data sources.
- A new metadata field for "cashtags" has been added.

๐Ÿ’™ Early Access & Misc

- We will be at Collision 2023 in Toronto. Meet us there!
6.9May 31, 20236.9 brings new Data partner catalog, beta partners, and many more changes!

โœจNew & Improved:

- Platform Catalog: An updated platform catalog is now available at: https://datastreamer.io/platform-catalog/. This new catalog provides greater information on sources.
- Sample Sources: Sample sources allow you to use a sample of a premium data source for free. Allowing you to explore, test, and try out a data source free of cost. WebSightLine sources are our launch partner, however, additional partner sources are coming soon.

๐Ÿ›  Fixes & Updates

- Update: Schema updates provide better social media data matching. Impacting specifically Twitter retweet, reply, and quote fields.
- Count API rate limit has expanded.
- Our status page (status.datastreamer.io) has pre-component subscriptions added. Allowing you to only receive notifications on the items you care about.

๐Ÿ’™ Early Access & Misc

- Data365 sources are now available in early access. Reach out to an account manager if you wish to try these sources.
6.8.4May 17, 20236.8.4 bring Qualify of Life (QOL) updates to a number of areas:

๐Ÿ›  Fixes & Updates

- Updates to metadata: releases of Offenses and Product categories for the inclusion of additional data partners.
- Release of new v6 -> v5 firehose adapter
- Fixed issue where Monitored Search may not deliver all matched content in high-volume queries.
- Fixed issue where Location Inference may be missing from some data sources.
- Fixed issue where the creation of new data sources would auto-assign a name with no ability to modify.
6.8.3March 20, 20236.8.3 brings Quality of Life updates to:

๐Ÿ›  Fixes & Updates

- Count API: Count API now returns data faster and with clearer error messages.
- Portal: You can now view previous months in Portal's billing and dashboard section.
- Location Inference: The Location Inference enrichment model is now able to predict more cities for the United States.
- Partners: DarkOwl groups have been added as filters
- Partners: Connecting compatible data sources has received more supporting documentation.
- Partners: CoHere integration is now available.
6.8.2February 7, 2023๐Ÿ›  Fixes & Updates

6.8.2 brings multiple bug fixes and optimizations for partner sources.
6.8.1January 23, 2023โœจNew & Improved:

Update 6.8.1 brings the release of Datastreamer's own Content Similarity Clustering. It allows you to cluster (group) similar content together from a query. Try it with news content today!
6.8 AmurJanuary 9, 2023The 6.8. Amur release brings a wealth of new functionality into the Datastreamer platform:

โœจNew & Improved:

- Count API: This new API endpoint allows you to view the total number of matching search results for a query.
- Country Inference: Utilize this new classifier to infer the location of a content post. Specially built for social content.

๐Ÿ›  Fixes & Updates

- Runaway query warning: This update for monitored search provides a notice to the user if making a high-volume monitored search.
6.7October 17, 2022Run a wealth of classifiers, models and operations on any data travelling through the pipeline. Enrich, expand, and combine.

โœจNew & Improved:

- House your own operations inside the Datastreamer pipeline
- Available launch operations:Detect language
Google Translate
Concat
Map
Hard news
Intent
Category
Sentiment
Private AI - PII Redaction
6.6September 1, 2022โœจNew & Improved:

Release of Operations API
6.3March 22, 2022โœจNew & Improved:

Launch of Datastreamer Data Partner Network with flagship partner: Opoint Media, and relevant metadata fields.

Release of Named Entity Recognition (NER) and Hard News classifiers, and relevant metadata fields.

Addition of doc_date.
6.2.1February 22, 2022โœจNew & Improved:

Addition of Highlighting functionality.
6.2January 5, 2022โœจNew & Improved:

Our Location Inference classifier and Aggregation API Endpoints have been released, and the following fields were added to the API as a result:

- enrichment.location_inference.label
- enrichement.location_inference.confidence
Up to 6.22021๐Ÿ›  Fixes & Updates

Summarizing the changes to 6.0.X and 6.1.X in 2021

The following fields were added to the API as additional metadata information:
_content.favorites
_content.followers
_content.following

The following fields were added in preparation for the violence classifier release.
enrichment.reported_violence.label
enrichment.reportedviolence.confidence

The following fields were modified for greater clarity.
twitter.tweettype
twitter.retweettype

The following fields were deprecated:
_ enrichment.spam_probability

The following fields were added to the API as additional metadata information:
instagram.contenttype
content.mentions

The Violence Classifier was released.

An update to our metadata has introduced a new field of: author.bio_links

Whatโ€™s Next