Reddit

The Reddit source collects posts and comments from Reddit as part of a Data Stream.


📘

Datastreamer selects from available providers automatically based on your Job configuration. You do not need to manage provider accounts or credentials to use this source.

Configuring a Job

When creating a Job for the Reddit source, you define what content to collect. Common configuration options include:

  • Keywords / query: terms to search for across Reddit
  • Subreddit targets: specific communities to monitor
  • Date range: the time window to collect data from
  • Content type: posts, comments, or both
  • Document limit: maximum number of documents per Job run

Refer to the Job creation documentation for full configuration details: Creating Jobs


What is Collected

Each document returned from the Reddit source represents a post or comment. Fields are mapped to the Datastreamer unified schema and include content text, author metadata, engagement metrics (upvotes, comments, awards), post date, and subreddit information.

Platform-specific fields are available under the reddit and subreddit schema namespaces. See the Schema Reference for field details.


Troubleshooting

Job fails or returns no data

  • Check that the subreddit or keyword query is valid
  • Verify the date range contains data
  • Review Job logs for specific errors

Unexpectedly high document counts

  • Scope the query to specific subreddits rather than all of Reddit
  • Set a document limit on the Job
  • Review the DVU pricing page

Provider switch noted in Job logs

  • This is expected behavior. If a provider is unavailable, the Job is routed to an alternative automatically. No action is required.

Related