Reddit

The Reddit source collects posts and comments from Reddit as part of a Data Stream.

📘
Datastreamer selects from available providers automatically based on your Job configuration. You do not need to manage provider accounts or credentials to use this source.

Configuring a Job

When creating a Job for the Reddit source, you define what content to collect. Common configuration options include:

Keywords / query: terms to search for across Reddit
Subreddit targets: specific communities to monitor
Date range: the time window to collect data from
Content type: posts, comments, or both
Document limit: maximum number of documents per Job run

Refer to the Job creation documentation for full configuration details: Creating Jobs

What is Collected

Each document returned from the Reddit source represents a post or comment. Fields are mapped to the Datastreamer unified schema and include content text, author metadata, engagement metrics (upvotes, comments, awards), post date, and subreddit information.

Platform-specific fields are available under the reddit and subreddit schema namespaces. See the Schema Reference for field details.

Troubleshooting

Job fails or returns no data

Check that the subreddit or keyword query is valid
Verify the date range contains data
Review Job logs for specific errors

Unexpectedly high document counts

Scope the query to specific subreddits rather than all of Reddit
Set a document limit on the Job
Review the DVU pricing page

Provider switch noted in Job logs

This is expected behavior. If a provider is unavailable, the Job is routed to an alternative automatically. No action is required.

Sources Overview
Creating Jobs
Data Volume Units
Direct Integrations: use if you need a specific provider directly

Updated about 1 month ago

Did this page help you?

Configuring a Job

What is Collected

Troubleshooting

Related