The Reddit source collects posts and comments from Reddit as part of a Data Stream.
Datastreamer selects from available providers automatically based on your Job configuration. You do not need to manage provider accounts or credentials to use this source.
Configuring a Job
When creating a Job for the Reddit source, you define what content to collect. Common configuration options include:
- Keywords / query: terms to search for across Reddit
- Subreddit targets: specific communities to monitor
- Date range: the time window to collect data from
- Content type: posts, comments, or both
- Document limit: maximum number of documents per Job run
Refer to the Job creation documentation for full configuration details: Creating Jobs
What is Collected
Each document returned from the Reddit source represents a post or comment. Fields are mapped to the Datastreamer unified schema and include content text, author metadata, engagement metrics (upvotes, comments, awards), post date, and subreddit information.
Platform-specific fields are available under the reddit and subreddit schema namespaces. See the Schema Reference for field details.
Troubleshooting
Job fails or returns no data
- Check that the subreddit or keyword query is valid
- Verify the date range contains data
- Review Job logs for specific errors
Unexpectedly high document counts
- Scope the query to specific subreddits rather than all of Reddit
- Set a document limit on the Job
- Review the DVU pricing page
Provider switch noted in Job logs
- This is expected behavior. If a provider is unavailable, the Job is routed to an alternative automatically. No action is required.
Related
- Sources Overview
- Creating Jobs
- Data Volume Units
- Direct Integrations: use if you need a specific provider directly
