Advanced Job Search
The Find Jobs REST endpoint allows you job history to be searched using powerful Lucene queries.
Job Search Endpoint
The endpoint accepts a HTTP POST request with a JSON payload that contains search information
{
"query": {
"sort": [
{
"field": string
"order": string
}
],
"from": int
"size": int
"track_total_hits": bool
"query": string
}
}
Where:
Field | Description | Default |
---|---|---|
sort | An optional list of fields to sort by. Currently only a single field is supported. | |
sort.field | The field to sort by. Can be any field listed here | "updated" |
sort.order | The sort order; either "ASC" or "DESC" | "DESC" |
from | The starting position for the search when using pagination | 0 |
size | The number of jobs to return | 100 |
track_total_hits | If set the query also returns the total number of jobs that match. | false |
query | The Lucene query |
The payload can be sent to the endpoint as follows:
curl --request POST \
--url https://api.platform.datastreamer.io/api/pipelines/jobs/search \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--header 'apikey: *'
--data '{
"query": {
"sort": [
{
"field": "updated",
"order": "DESC"
}
],
"from": 0,
"size": 100,
"track_total_hits": false,
"query": "label:Periodic-Table-Project-X"
}
}'
Results
The API will return a JSON response containing details of the jobs, such as job IDs, statuses, and timestamps. e.g:-
{
"type": "job",
"organization_id": "xxxxxx",
"pipeline_id": "xxxxxx",
"step_id": "xxxxxx",
"job_id": "574930t5-b7af-4601-bfe4-47d77997fe8d",
"state": "succeeded",
"job_name": "574930t5-b7af-4601-bfe4-47d77997fe8d",
"data_source": "brightdata_youtube_posts",
"job_type": "oneTime",
"priority": "normal",
"label": "Periodic-Table-Project-X",
"preferred_start_time": "2024-12-13T12:50:26.496878Z",
"query_from": "2024-12-01T00:00:00Z",
"query_to": "2024-12-12T00:00:00Z",
"query": {
"search_type": "keywords",
"keywords": [
"Periodic Table"
]
},
"work_time_offset": 0,
"start_time_offset": 0,
"max_documents": 100,
"created": "2024-12-13T12:50:07.708852Z",
"updated": "2024-12-13T12:50:26.665749Z",
"schedule_time": "2024-12-13T12:50:26.665749Z",
"document_count": 99
}
Work Item Search Endpoint
For each Job, multiple Work Items can be automatically created to manage the data retrieval from Jobs within dynamic pipelines. The Work Items endpoint accepts a HTTP POST request with a JSON payload that contains search information using the same model as the Job endpoint.
The payload can be sent to the endpoint as follows:
curl --request POST \
--url https://api.platform.datastreamer.io/api/pipelines/work-items/search \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--header 'apikey: *'
--data '{
"query": {
"sort": [
{
"field": "updated",
"order": "DESC"
}
],
"from": 0,
"size": 100,
"track_total_hits": false,
"query": "job_id:574930t5-b7af-4601-bfe4-47d77997fe8d"
}
}'
Results
The API will return a JSON response containing details of the work items, such as job IDs, statuses, and timestamps. e.g:-
{
"start": 0,
"count": 1,
"records": [
{
"type": "work_item",
"organization_id": "xxxxxx",
"pipeline_id": "xxxxxx",
"step_id": "xxxxxx",
"work_item_id": "5thus54p-3999-4b75-83dc-dc16a24fe166",
"job_id": "574930t5-b7af-4601-bfe4-47d77997fe8d",
"data_source": "brightdata_youtube_posts",
"provider_task_id": null,
"parent_work_item_id": null,
"state": "completed",
"preferred_start_time": "2024-12-13T12:55:41.692496Z",
"document_count": 99,
"max_documents": 100,
"failure_count": 0,
"created": "2024-12-13T12:50:26.665749Z",
"updated": "2024-12-13T12:55:41.692496Z",
"actioned": "2024-12-13T12:55:40.690468Z",
"action_id": "8857dbd9-0c36-4b36-b46e-bd100b1255e7",
"query_from": "2024-12-01T00:00:00Z",
"query_to": "2024-12-12T00:00:00Z",
"query": {
"search_type": "keywords",
"keywords": [
"Periodic Table"
]
},
"label": "Periodic-Table-Project-X"
},
...
]
}
Lucene Search
Lucene queries are described on this page: Apache Lucene - Query Parser Syntax
Keywords such as AND, OR, NOT and TO must be in upper case.
Not all features of Lucene are supported (such as boosting and ~).
The fields available for searching are
Field | Type | Description |
---|---|---|
pipeline_id | string | Matches based on the job pipeline ID |
step_id | string | Matches based on the job step ID within the pipeline (when you have multiple ingresses in your pipeline |
job_id | string | Matches based on the job id |
job_name | string | Matches based on the job name |
work_item_id | string | Matches based on the job work_item_id that is available once the job is first scheduled |
data_source | string | Matches jobs for the specified data source (e.g. wsl_instagram) |
job_type | string | Either OneTime or Periodic (this field is not case sensitive) |
state or status | string | Matches any one of the following valuesNotReady Ready Scheduled Running Succeeded Failed Deleted Disabled (this field is not case sensitive) |
label | string | Match based on the user input job label |
tags | string | Match based on the use tags. This is a wildcard search by default |
created | date | Match based on the date the job was created |
updated | date | Match based on the date the job was last updated data |
preferred_start_time | date | Match based on the date the job was scheduled to start |
query_from | date | Match based on the start of the date range for the query |
query_to | date | Match based on the end of the date range for the query |
query | JSON | Match based on the JSON representation of the query. Exact details depend on the data source. |
max_documents | integer | Match based on the maximum number of documents a single execution of the job can return |
document_count | integer | Match based on the cumulative total of documents found when executing this job |
Notes:
- If no field is specified, then
query
,label
andtags
will be searched. - All string fields can match on the wildcard character
*
. - All string fields can be quoted with
"..."
for a match on the entire string without wildcards *
can be escaped with\
in an unquoted string if you need to find the*
character.null
Can be used to find jobs where the value is not set. You can also use the negation character!
to find non-null values, e.g:label: !null
Date Format
Dates can be entered as follows:
-
UTC time in ISO8601 Date/Time format (
YYYY-MM-DDTHH:MM:SS
) -
Date in ISO8601 Date format (
YYYY-MM-DD
) -
Offset expression:
[+|-]<number><unit>
(e.g.-1h
).Value Unit s Seconds mi Minutes h Hours d Days w Weeks m Months y Year
-
Fixed expression
Value Description now The current date noon The current evaluated date at 12:00:00 midnight The current evaluated date at 00:00:00 eod The current evaluated date at 1 tick before midnight fom Midnight on the first of the current evaluated month. eom The current evaluated date set 1 ticket before midnight on the last day of the month sow 00:00:00 on the preceding Sunday for the current evaluated date. eow 1 ticket before midnight on the next Saturday after the current evaluated date. today Sets the current evaluated date to today's date. Time of day is unchanged yesterday Sets the current evaluated date to yesterday’s date. Time of day is unchanged tomorrow Sets the current evaluated date to tomorrow’s date. Time of day is unchanged at “hh:mm” Sets the current time to hh:mm
Examples
state: succeeded AND updated: [-3m to -1m]
Return all successfully jobs completed between 1 and 3 months ago.cat OR dog AND data_source: wsl_instagram
Return all jobs searching for cat or dog on the wsl_instagram data source.label: tiger AND created: [fom-1m TO eom-1m]
Return all jobs with label tiger created in the previous calendar monthfish AND chips AND [sow-1w TO eow-1w]
Return all jobs searching for fish and chips for the previous week (Sunday 00:00.00 to Saturday 23:59:59.999)
Updated about 1 month ago