Lucene Document Filter
What is the Lucene Document Filter?
The Lucene Document Filter is a powerful pipeline component that filters incoming documents based on a Lucene query. Only documents that match the query are passed forward, all others are discarded.
Use this when you need precise filtering using keyword search, ranges, boolean logic, or exact matches, similar to search engine queries.
How It Works
- Configuration: One Lucene query.
- Behavior: Discards all documents that don’t match the query.
- Output: Only the matching documents continue down the pipeline.
No transformation is done, it purely filters based on content.
Example: Filtering Pulp Fiction Quotes
Lucene Query
author:"Jules" AND quote:"path of the righteous man"
Sample Document (Passed Through)
{
"author": "Jules",
"quote": "The path of the righteous man is beset on all sides..."
}
Sample Document (Discarded)
{
"author": "Vincent",
"quote": "That's a tasty burger."
}
In this example, only quotes from Jules that include the phrase "path of the righteous man" will pass through.
Query Syntax Tips
Lucene supports advanced filtering patterns:
Pattern | Example |
---|---|
Field match | author:Jules |
Phrase match | quote:"tasty burger" |
Boolean logic | author:Jules AND quote:path |
Wildcards | author:Jul* |
Range query | year:[1990 TO 2000] |
Use these to build filters that are both expressive and efficient.
Summary
- The Lucene Document Filter allows you to apply search-style logic to filter pipeline documents.
- It's ideal for text-based conditions where the JSON router logic isn’t flexible enough.
- Documents that do not match the query are dropped, not just bypassed.
FAQ
What is a Lucene Document Filter?
A pipeline component that filters documents based on a Lucene query and discards all non-matching ones.
Does it modify the document?
No, it only filters, it doesn’t transform or enrich the data.
Can I use phrase and boolean searches?
Yes! You can use phrases, wildcards, field matches, ranges, and boolean logic.
Updated about 16 hours ago