Lucene Document Filter

What is the Lucene Document Filter?

The Lucene Document Filter is a powerful pipeline component that filters incoming documents based on a Lucene query. Only documents that match the query are passed forward, all others are discarded.

Use this when you need precise filtering using keyword search, ranges, boolean logic, or exact matches, similar to search engine queries.

How It Works

  • Configuration: One Lucene query.
  • Behavior: Discards all documents that don’t match the query.
  • Output: Only the matching documents continue down the pipeline.

No transformation is done, it purely filters based on content.

Example: Filtering Pulp Fiction Quotes

Lucene Query

author:"Jules" AND quote:"path of the righteous man"

Sample Document (Passed Through)

{
  "author": "Jules",
  "quote": "The path of the righteous man is beset on all sides..."
}

Sample Document (Discarded)

{
  "author": "Vincent",
  "quote": "That's a tasty burger."
}

In this example, only quotes from Jules that include the phrase "path of the righteous man" will pass through.

Query Syntax Tips

Lucene supports advanced filtering patterns:

PatternExample
Field matchauthor:Jules
Phrase matchquote:"tasty burger"
Boolean logicauthor:Jules AND quote:path
Wildcardsauthor:Jul*
Range queryyear:[1990 TO 2000]

Use these to build filters that are both expressive and efficient.

Summary

  • The Lucene Document Filter allows you to apply search-style logic to filter pipeline documents.
  • It's ideal for text-based conditions where the JSON router logic isn’t flexible enough.
  • Documents that do not match the query are dropped, not just bypassed.

FAQ

What is a Lucene Document Filter?

A pipeline component that filters documents based on a Lucene query and discards all non-matching ones.

Does it modify the document?

No, it only filters, it doesn’t transform or enrich the data.

Can I use phrase and boolean searches?

Yes! You can use phrases, wildcards, field matches, ranges, and boolean logic.