Advanced Job Search

The Find Jobs REST endpoint allows you job history to be searched using powerful Lucene queries.

Job Search Endpoint

The endpoint accepts a HTTP POST request with a JSON payload that contains search information

{
  "query": {
    "sort": [
      {
        "field": string
        "order": string
      }
    ],
    "from": int
    "size": int
    "track_total_hits": bool
    "query": string
  }
}

Where:

FieldDescriptionDefault
sortAn optional list of fields to sort by. Currently only a single field is supported.
sort.fieldThe field to sort by. Can be any field listed here"updated"
sort.orderThe sort order; either "ASC" or "DESC""DESC"
fromThe starting position for the search when using pagination0
sizeThe number of jobs to return100
track_total_hitsIf set the query also returns the total number of jobs that match.false
queryThe Lucene query

The payload can be sent to the endpoint as follows:

curl --request POST \
  --url https://api.platform.datastreamer.io/api/pipelines/jobs/search \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'apikey: *'
  --data '{
  "query": {
    "sort": [
      {
        "field": "updated",
        "order": "DESC"
       }
    ],
    "from": 0,
    "size": 100,
    "track_total_hits": false,
    "query": "label:Periodic-Table-Project-X"
  }
}'

Results

The API will return a JSON response containing details of the jobs, such as job IDs, statuses, and timestamps. e.g:-

{
  "type": "job",
  "organization_id": "xxxxxx",
  "pipeline_id": "xxxxxx",
  "step_id": "xxxxxx",
  "job_id": "574930t5-b7af-4601-bfe4-47d77997fe8d",
  "state": "succeeded",
  "job_name": "574930t5-b7af-4601-bfe4-47d77997fe8d",
  "data_source": "brightdata_youtube_posts",
  "job_type": "oneTime",
  "priority": "normal",
  "label": "Periodic-Table-Project-X",
  "preferred_start_time": "2024-12-13T12:50:26.496878Z",
  "query_from": "2024-12-01T00:00:00Z",
  "query_to": "2024-12-12T00:00:00Z",
  "query": {
    "search_type": "keywords",
    "keywords": [
      "Periodic Table"
    ]
  },
  "work_time_offset": 0,
  "start_time_offset": 0,
  "max_documents": 100,
  "created": "2024-12-13T12:50:07.708852Z",
  "updated": "2024-12-13T12:50:26.665749Z",
  "schedule_time": "2024-12-13T12:50:26.665749Z",
  "document_count": 99
}

Work Item Search Endpoint

For each Job, multiple Work Items can be automatically created to manage the data retrieval from Jobs within dynamic pipelines. The Work Items endpoint accepts a HTTP POST request with a JSON payload that contains search information using the same model as the Job endpoint.

The payload can be sent to the endpoint as follows:

curl --request POST \
  --url https://api.platform.datastreamer.io/api/pipelines/work-items/search \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'apikey: *'
  --data '{
  "query": {
    "sort": [
      {
        "field": "updated",
        "order": "DESC"
       }
    ],
    "from": 0,
    "size": 100,
    "track_total_hits": false,
    "query": "job_id:574930t5-b7af-4601-bfe4-47d77997fe8d"
  }
}'

Results

The API will return a JSON response containing details of the work items, such as job IDs, statuses, and timestamps. e.g:-

{
    "start": 0,
    "count": 1,
    "records": [
        {
            "type": "work_item",
            "organization_id": "xxxxxx",
            "pipeline_id": "xxxxxx",
            "step_id": "xxxxxx",
            "work_item_id": "5thus54p-3999-4b75-83dc-dc16a24fe166",
            "job_id": "574930t5-b7af-4601-bfe4-47d77997fe8d",
            "data_source": "brightdata_youtube_posts",
            "provider_task_id": null,
            "parent_work_item_id": null,
            "state": "completed",
            "preferred_start_time": "2024-12-13T12:55:41.692496Z",
            "document_count": 99,
            "max_documents": 100,
            "failure_count": 0,
            "created": "2024-12-13T12:50:26.665749Z",
            "updated": "2024-12-13T12:55:41.692496Z",
            "actioned": "2024-12-13T12:55:40.690468Z",
            "action_id": "8857dbd9-0c36-4b36-b46e-bd100b1255e7",
            "query_from": "2024-12-01T00:00:00Z",
            "query_to": "2024-12-12T00:00:00Z",
            "query": {
                "search_type": "keywords",
                "keywords": [
                    "Periodic Table"
                ]
            },
            "label": "Periodic-Table-Project-X"
        },
        ...
    ]
}

Lucene Search

Lucene queries are described on this page: Apache Lucene - Query Parser Syntax

Keywords such as AND, OR, NOT and TO must be in upper case.

Not all features of Lucene are supported (such as boosting and ~).

The fields available for searching are

Field Type Description

pipeline_id

string

Matches based on the job pipeline ID

step_id

string

Matches based on the job step ID within the pipeline (when you have multiple ingresses in your pipeline

job_id

string

Matches based on the job id

job_name

string

Matches based on the job name

work_item_id

string

Matches based on the job work_item_id that is available once the job is first scheduled

data_source

string

Matches jobs for the specified data source (e.g. wsl_instagram)

job_type

string

Either OneTime or Periodic (this field is not case sensitive)

state
or
status

string

Matches any one of the following values

NotReady
Ready
Scheduled
Running
Succeeded
Failed
Deleted
Disabled

(this field is not case sensitive)

label

string

Match based on the user input job label

tags

string

Match based on the use tags. This is a wildcard search by default

created

date

Match based on the date the job was created

updated

date

Match based on the date the job was last updated data

preferred_start_time

date

Match based on the date the job was scheduled to start

query_from

date

Match based on the start of the date range for the query

query_to

date

Match based on the end of the date range for the query

query

JSON

Match based on the JSON representation of the query. Exact details depend on the data source.

max_documents

integer

Match based on the maximum number of documents a single execution of the job can return

document_count

integer

Match based on the cumulative total of documents found when executing this job

Notes:

  1. If no field is specified, then query, label and tags will be searched.
  2. All string fields can match on the wildcard character * .
  3. All string fields can be quoted with "..." for a match on the entire string without wildcards
  4. * can be escaped with \ in an unquoted string if you need to find the * character.
  5. null Can be used to find jobs where the value is not set. You can also use the negation character ! to find non-null values, e.g: label: !null

Date Format

Dates can be entered as follows:

  1. UTC time in ISO8601 Date/Time format (YYYY-MM-DDTHH:MM:SS)

  2. Date in ISO8601 Date format (YYYY-MM-DD)

  3. Offset expression: [+|-]<number><unit> (e.g. -1h).

    ValueUnit
    sSeconds
    miMinutes
    hHours
    dDays
    wWeeks
    mMonths
    yYear

  4. Fixed expression

    ValueDescription
    nowThe current date
    noonThe current evaluated date at 12:00:00
    midnightThe current evaluated date at 00:00:00
    eodThe current evaluated date at 1 tick before midnight
    fomMidnight on the first of the current evaluated month.
    eomThe current evaluated date set 1 ticket before midnight on the last day of the month
    sow00:00:00 on the preceding Sunday for the current evaluated date.
    eow1 ticket before midnight on the next Saturday after the current evaluated date.
    todaySets the current evaluated date to today's date. Time of day is unchanged
    yesterdaySets the current evaluated date to yesterday’s date. Time of day is unchanged
    tomorrowSets the current evaluated date to tomorrow’s date. Time of day is unchanged
    at “hh:mm”Sets the current time to hh:mm

Examples

  1. state: succeeded AND updated: [-3m to -1m] Return all successfully jobs completed between 1 and 3 months ago.
  2. cat OR dog AND data_source: wsl_instagram Return all jobs searching for cat or dog on the wsl_instagram data source.
  3. label: tiger AND created: [fom-1m TO eom-1m] Return all jobs with label tiger created in the previous calendar month
  4. fish AND chips AND [sow-1w TO eow-1w] Return all jobs searching for fish and chips for the previous week (Sunday 00:00.00 to Saturday 23:59:59.999)