New: Jobs DVU Count API

This new API helps customers to better estimate and measure Job usage for billing or monitoring.

This API retrieves the DVU count for executed jobs. Results can be grouped either by DataSource or JobId (default), depending on the parameter group_by.

To see the documentation for the the Jobs DVU Count API, you can access it here: https://docs.datastreamer.io/docs/jobs-dvu-count-api#/

8 months ago

Improved

Improved: Logging added to more Enrichments

Logging and metrics have been improved and added to more classifier and enrichments.

For more details on the Logging feature present in components, you can access the detailed documentation page here: https://docs.datastreamer.io/docs/view-component-logs#/

9 months ago

Added

New: Create Jobs with AI and Datastreamer's MCP Server

With the release of Datastreamer's MCP server, you can now integrate your AI features closer to your data pipelines!

If you have not read the "CTO Brief" on Datastreamer's strategy of being the agent interface for social data, you can access it here: https://datastreamer.io/agent-interface-for-social-data-the-cto-edition/

Inside this update, you can now access the Datastreamer MCP server and the first of our AI tools "Create Job" which allows you to create data collection jobs with natural language. This gives your new AI features, the ability to easily access the data they need.

Read more about the Create Job tool here: https://docs.datastreamer.io/docs/job-creation-agent#/

Get started connecting your MCP client here: https://docs.datastreamer.io/docs/mcp-server-setup-guide#/

10 months ago

Improved

Improved: Periodic Job Date Flexibility

When you create a Periodic Job, the default behaviour is to collect any new content matching your query since the last run time. Some customers, working to identify changes in the content's metrics, have requested greater flexibility in coverage time.

With the "Query Start Time Adjustment" option in all Periodic Jobs, you can now set an adjusted search start time. You can now set the number of seconds to adjust the query start time backwards in time. For example: search hourly, for the previous two hours. This is best used with updatable storage (like Searchable Storage) to ensure updated fields are registered.

10 months ago

Added

New: Document Deduplication Component

This new component can filter out duplicates with 24-hour rolling windows by allowing you to select configurable deduplication fields. You can use this filter to move duplicate content into another stream, storage, or trash.

10 months ago

Added

New: Import/Export Pipeline Configuration

The Pipeline Import/Export feature provides a simple way to transfer pipeline configurations between accounts or create backups with a single click.

The generated file uses a standardized JSON format (a .pipeline file) that is both machine-readable and human-understandable, and contains the functionality required to configure a new pipeline!

10 months ago

Added

New: Additional Vetric data source components

Additional data source components have been added to support new Vetric data source endpoints.

11 months ago

Improved

Improved: Query builder, custom data source support, and GenAI translation.

Greater support for custom data sources: https://docs.datastreamer.io/update/docs/private-data-sources#/
Improvements to Pipeline validation, improving the error messages when running misconfigured pipelines.
Removed empty and unnecessary fields from Pipeline export files.
Fixed issues with query builder when using select Socialgist sources.
Improved chunking when using GenAI Translation component.

11 months ago

Added

New: Apify Actor Integration Components

Within Datastreamer, you can now integrate any Apify actor into your pipelines!

About Apify: Apify provides a marketplace of scrapers designed for large-scale web scraping, data extraction, and browser automation. This marketplace of community-made scrapers, let's you run 5,000+ different scrapers known as "Actors".

The new Apify Actor Integration component allows you to use any of the Actors from Apify within your pipeline.

To get started, check out this Setup Guide: Apify Integration Setup.

Not sure what to start with? Registry offers a selection of the 5,000+ Apify Actors:

11 months ago

Fixed

Bug Fix: Per Document Storage in AWS S3 Egress

We've released a fix that was causing per-document storage in AWS S3 storage.