Lucene Document Filter

What is the Lucene Document Filter?

The Lucene Document Filter is a powerful pipeline component that filters incoming documents based on a Lucene query. Only documents that match the query are passed forward, all others are discarded.

Use this when you need precise filtering using keyword search, ranges, boolean logic, or exact matches, similar to search engine queries.

How It Works

Configuration: One Lucene query.
Behavior: Discards all documents that don’t match the query.
Output: Only the matching documents continue down the pipeline.

No transformation is done, it purely filters based on content.

Example: Filtering Pulp Fiction Quotes

Lucene Query

author:"Jules" AND quote:"path of the righteous man"

Sample Document (Passed Through)

{
  "author": "Jules",
  "quote": "The path of the righteous man is beset on all sides..."
}

Sample Document (Discarded)

{
  "author": "Vincent",
  "quote": "That's a tasty burger."
}

In this example, only quotes from Jules that include the phrase "path of the righteous man" will pass through.

Query Syntax Tips

Lucene supports advanced filtering patterns:

Pattern	Example
Field match	`author:Jules`
Phrase match	`quote:"tasty burger"`
Boolean logic	`author:Jules AND quote:path`
Wildcards	`author:Jul*`
Range query	`year:[1990 TO 2000]`

Use these to build filters that are both expressive and efficient.

Summary

The Lucene Document Filter allows you to apply search-style logic to filter pipeline documents.
It's ideal for text-based conditions where the JSON router logic isn’t flexible enough.
Documents that do not match the query are dropped, not just bypassed.

FAQ

What is a Lucene Document Filter?

A pipeline component that filters documents based on a Lucene query and discards all non-matching ones.

Does it modify the document?

No, it only filters, it doesn’t transform or enrich the data.

Can I use phrase and boolean searches?

Yes! You can use phrases, wildcards, field matches, ranges, and boolean logic.