Document Batcher

Document Batcher

The Document Batcher component is used to group incoming documents into batches based on a defined threshold. This is useful when downstream components or APIs work best with a specific number or size of documents at a time.


Configuration Options

Setting

Description

Split Type

Defines how batching is evaluated. Choose from:

  • By Count (default): Batch by number of documents
  • By Size: Batch by byte size of the combined documents

Split Threshold

Maximum number of documents or total byte size before a batch is emitted.

Use these settings to control how many documents are grouped together and when a batch is released downstream.


When to Use the Document Batcher

When your pipeline or integration expects documents in fixed-size chunks.


Summary

  • Batches documents either by count or by size.
  • Customizable split threshold to define batch limits.
  • Ideal for downstream systems that prefer or require grouped input.

FAQ

What does the Document Batcher do?

It collects incoming documents and emits them in batches once a specified threshold is met.

Can I batch by data size instead of document count?

Yes. Set Split Type to By Size to use byte size as the threshold.