Advanced Job Search

The Find Jobs REST endpoint allows you job history to be searched using powerful Lucene queries.

Job Search Endpoint

The endpoint accepts a HTTP POST request with a JSON payload that contains search information

{
  "query": {
    "sort": [
      {
        "field": string
        "order": string
      }
    ],
    "from": int
    "size": int
    "track_total_hits": bool
    "query": string
  }
}

Where:

Field	Description	Default
sort	An optional list of fields to sort by. Currently only a single field is supported.
sort.field	The field to sort by. Can be any field listed here	`"updated"`
sort.order	The sort order; either `"ASC"` or `"DESC"`	`"DESC"`
from	The starting position for the search when using pagination	`0`
size	The number of jobs to return	`100`
track_total_hits	If set the query also returns the total number of jobs that match.	`false`
query	The Lucene query

The payload can be sent to the endpoint as follows:

curl --request POST \
  --url https://api.platform.datastreamer.io/api/pipelines/jobs/search \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'apikey: *'
  --data '{
  "query": {
    "sort": [
      {
        "field": "updated",
        "order": "DESC"
       }
    ],
    "from": 0,
    "size": 100,
    "track_total_hits": false,
    "query": "label:Periodic-Table-Project-X"
  }
}'

Results

The API will return a JSON response containing details of the jobs, such as job IDs, statuses, and timestamps. e.g:-

{
  "type": "job",
  "organization_id": "xxxxxx",
  "pipeline_id": "xxxxxx",
  "step_id": "xxxxxx",
  "job_id": "574930t5-b7af-4601-bfe4-47d77997fe8d",
  "state": "succeeded",
  "job_name": "574930t5-b7af-4601-bfe4-47d77997fe8d",
  "data_source": "brightdata_youtube_posts",
  "job_type": "oneTime",
  "priority": "normal",
  "label": "Periodic-Table-Project-X",
  "preferred_start_time": "2024-12-13T12:50:26.496878Z",
  "query_from": "2024-12-01T00:00:00Z",
  "query_to": "2024-12-12T00:00:00Z",
  "query": {
    "search_type": "keywords",
    "keywords": [
      "Periodic Table"
    ]
  },
  "work_time_offset": 0,
  "start_time_offset": 0,
  "max_documents": 100,
  "created": "2024-12-13T12:50:07.708852Z",
  "updated": "2024-12-13T12:50:26.665749Z",
  "schedule_time": "2024-12-13T12:50:26.665749Z",
  "document_count": 99
}

Work Item Search Endpoint

For each Job, multiple Work Items can be automatically created to manage the data retrieval from Jobs within dynamic pipelines. The Work Items endpoint accepts a HTTP POST request with a JSON payload that contains search information using the same model as the Job endpoint.

The payload can be sent to the endpoint as follows:

curl --request POST \
  --url https://api.platform.datastreamer.io/api/pipelines/work-items/search \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'apikey: *'
  --data '{
  "query": {
    "sort": [
      {
        "field": "updated",
        "order": "DESC"
       }
    ],
    "from": 0,
    "size": 100,
    "track_total_hits": false,
    "query": "job_id:574930t5-b7af-4601-bfe4-47d77997fe8d"
  }
}'

Results

The API will return a JSON response containing details of the work items, such as job IDs, statuses, and timestamps. e.g:-

{
    "start": 0,
    "count": 1,
    "records": [
        {
            "type": "work_item",
            "organization_id": "xxxxxx",
            "pipeline_id": "xxxxxx",
            "step_id": "xxxxxx",
            "work_item_id": "5thus54p-3999-4b75-83dc-dc16a24fe166",
            "job_id": "574930t5-b7af-4601-bfe4-47d77997fe8d",
            "data_source": "brightdata_youtube_posts",
            "provider_task_id": null,
            "parent_work_item_id": null,
            "state": "completed",
            "preferred_start_time": "2024-12-13T12:55:41.692496Z",
            "document_count": 99,
            "max_documents": 100,
            "failure_count": 0,
            "created": "2024-12-13T12:50:26.665749Z",
            "updated": "2024-12-13T12:55:41.692496Z",
            "actioned": "2024-12-13T12:55:40.690468Z",
            "action_id": "8857dbd9-0c36-4b36-b46e-bd100b1255e7",
            "query_from": "2024-12-01T00:00:00Z",
            "query_to": "2024-12-12T00:00:00Z",
            "query": {
                "search_type": "keywords",
                "keywords": [
                    "Periodic Table"
                ]
            },
            "label": "Periodic-Table-Project-X"
        },
        ...
    ]
}

Lucene Search

Lucene queries are described on this page: Apache Lucene - Query Parser Syntax

❗
Keywords such as AND, OR, NOT and TO must be in upper case.
Not all features of Lucene are supported (such as boosting and ~).

The fields available for searching are

Field	Type	Description
pipeline_id	string	Matches based on the job pipeline ID
step_id	string	Matches based on the job step ID within the pipeline (when you have multiple ingresses in your pipeline
job_id	string	Matches based on the job id
job_name	string	Matches based on the job name
work_item_id	string	Matches based on the job work_item_id that is available once the job is first scheduled
data_source	string	Matches jobs for the specified data source (e.g. wsl_instagram)
job_type	string	Either `OneTime` or `Periodic` (this field is not case sensitive)
state or status	string	Matches any one of the following values `NotReady` `Ready` `Scheduled` `Running` `Succeeded` `Failed` `Deleted` `Disabled` (this field is not case sensitive)
label	string	Match based on the user input job label
tags	string	Match based on the use tags. This is a wildcard search by default
created	date	Match based on the date the job was created
updated	date	Match based on the date the job was last updated data
preferred_start_time	date	Match based on the date the job was scheduled to start
query_from	date	Match based on the start of the date range for the query
query_to	date	Match based on the end of the date range for the query
query	JSON	Match based on the JSON representation of the query. Exact details depend on the data source.
max_documents	integer	Match based on the maximum number of documents a single execution of the job can return
document_count	integer	Match based on the cumulative total of documents found when executing this job

Notes:

If no field is specified, then query, label and tags will be searched.
All string fields can match on the wildcard character * .
All string fields can be quoted with "..." for a match on the entire string without wildcards
* can be escaped with \ in an unquoted string if you need to find the * character.
null Can be used to find jobs where the value is not set. You can also use the negation character ! to find non-null values, e.g: label: !null

Date Format

Dates can be entered as follows:

UTC time in ISO8601 Date/Time format (YYYY-MM-DDTHH:MM:SS)
Date in ISO8601 Date format (YYYY-MM-DD)
Offset expression: [+|-]<number><unit> (e.g. -1h).

Value Unit
s Seconds
mi Minutes
h Hours
d Days
w Weeks
m Months
y Year

Value	Unit
s	Seconds
mi	Minutes
h	Hours
d	Days
w	Weeks
m	Months
y	Year

Fixed expression

Value	Description
now	The current date
noon	The current evaluated date at 12:00:00
midnight	The current evaluated date at 00:00:00
eod	The current evaluated date at 1 tick before midnight
fom	Midnight on the first of the current evaluated month.
eom	The current evaluated date set 1 ticket before midnight on the last day of the month
sow	00:00:00 on the preceding Sunday for the current evaluated date.
eow	1 ticket before midnight on the next Saturday after the current evaluated date.
today	Sets the current evaluated date to today's date. Time of day is unchanged
yesterday	Sets the current evaluated date to yesterday’s date. Time of day is unchanged
tomorrow	Sets the current evaluated date to tomorrow’s date. Time of day is unchanged
at “hh:mm”	Sets the current time to hh:mm

Examples

state: succeeded AND updated: [-3m to -1m] Return all successfully jobs completed between 1 and 3 months ago.
cat OR dog AND data_source: wsl_instagram Return all jobs searching for cat or dog on the wsl_instagram data source.
label: tiger AND created: [fom-1m TO eom-1m] Return all jobs with label tiger created in the previous calendar month
fish AND chips AND [sow-1w TO eow-1w] Return all jobs searching for fish and chips for the previous week (Sunday 00:00.00 to Saturday 23:59:59.999)