Bug Fix: Language Detection

Fixed an issue that may cause the Datastreamer Language Detection component to not run in some situations.

5 months ago

fixed

Bug Fix: Required metadata missing from upload

A bug was discovered that caused metadata to be lost that was required for upload. This impacted upload of data into some cloud storage egresses.

The team responded quickly and all pipelines affected received the update. No content was impacted or lost, due to the intelligent component caching.

5 months ago

improved

Improved: Job overview now includes error message details provided by 3rd parties

In cases of Jobs failing due to 3rd party errors, the status of Jobs have been updated to show the error details provided from that 3rd party API. This will replace the showing "Failed" messages without description in these cases.

This initial improvement ensures that the "Bad Request" error messages from Vetric APIs are now being displayed when the Job fails.

5 months ago

added

New: Query validation on Jobs

When using provided components to ingress data through Jobs, the Platform will validate if the queries are valid. If the queries are not valid, the user be notified, and not able to save the Job. This will help ensure that data collection failures are proactively addressed, saving time and spend!

This release brings Lucene query validation to:

WebSightLine Instagram
WebSightLine Threads
Socialgist TikTok
Opoint News

Thank you for this feedback from our customers and users!

6 months ago

added

New: Recipes for Custom Functions

Our Datascience team has released a number of Recipes to provide starting templates for Custom Functions. You can copy/modify/tweak and use any of these Recipes within the Custom Functions capability released earlier this month.

These Recipes give you a quick way to add many in-Pipeline capabilities to do things like: detect urgency, sentiment analysis, text cleaning, extract links from content, and much more!

You can view all Recipes here: docs.datastreamer.io/recipes

6 months ago

added

New: Firehose egress component

A new component that allows you to egress content from your Pipeline, as a Firehose!

Full Documentation

6 months ago

added

New: Socialgist Reddit Integration

A new integration to work with Socialgist Reddit has been released.

6 months ago

improved

Improved: Cancel and delete jobs

Within the Jobs screen within Portal you now have the ability to bulk cancel and delete jobs. If you have created a Job that you wish to cancel, you can cancel while it is running; and the Platform will handle cancelling it with the integrated source as quickly as possible. As close to an "undo" as possible!

6 months ago

added

New: Custom Functions (write & run)

The Custom Function component allows you to write Python functions to manipulate and transform documents within your pipeline. This flexibility enables you to perform a wide range of tasks, from simple data cleaning to merge documents.

Whether you're filtering documents, enriching data, or transforming formats, the Custom Function component empowers you to customize your pipeline to meet your specific needs.

Full Documentation

6 months ago

improved

PDF table to JSON

Updates and refactoring of the underlying models for converting tables in PDFs into JSON has been released.