Get Started

Advanced Job Search

The Find Jobs REST endpoint allows you job history to be searched using powerful Lucene queries.

Job Search Endpoint

The endpoint accepts a HTTP POST request with a JSON payload that contains search information

{
  "query": {
    "sort": [
      {
        "field": string
        "order": string
      }
    ],
    "from": int
    "size": int
    "track_total_hits": bool
    "query": string
  }
}

Where:

FieldDescriptionDefault
sortAn optional list of fields to sort by. Currently only a single field is supported.
sort.fieldThe field to sort by. Can be any field listed here"updated"
sort.orderThe sort order; either "ASC" or "DESC""DESC"
fromThe starting position for the search when using pagination0
sizeThe number of jobs to return100
track_total_hitsIf set the query also returns the total number of jobs that match.false
queryThe Lucene query

The payload can be sent to the endpoint as follows:

curl --request POST \
  --url https://api.platform.datastreamer.io/api/pipelines/jobs/search \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'apikey: *'
  --data '{
  "query": {
    "sort": [
      {
        "field": "updated",
        "order": "DESC"
       }
    ],
    "from": 0,
    "size": 100,
    "track_total_hits": false,
    "query": "label:Periodic-Table-Project-X"
  }
}'

Results

The API will return a JSON response containing details of the jobs, such as job IDs, statuses, and timestamps. e.g:-

{
  "type": "job",
  "organization_id": "xxxxxx",
  "pipeline_id": "xxxxxx",
  "step_id": "xxxxxx",
  "job_id": "574930t5-b7af-4601-bfe4-47d77997fe8d",
  "state": "succeeded",
  "job_name": "574930t5-b7af-4601-bfe4-47d77997fe8d",
  "data_source": "brightdata_youtube_posts",
  "job_type": "oneTime",
  "priority": "normal",
  "label": "Periodic-Table-Project-X",
  "preferred_start_time": "2024-12-13T12:50:26.496878Z",
  "query_from": "2024-12-01T00:00:00Z",
  "query_to": "2024-12-12T00:00:00Z",
  "query": {
    "search_type": "keywords",
    "keywords": [
      "Periodic Table"
    ]
  },
  "work_time_offset": 0,
  "start_time_offset": 0,
  "max_documents": 100,
  "created": "2024-12-13T12:50:07.708852Z",
  "updated": "2024-12-13T12:50:26.665749Z",
  "schedule_time": "2024-12-13T12:50:26.665749Z",
  "document_count": 99
}

Work Item Search Endpoint

For each Job, multiple Work Items can be automatically created to manage the data retrieval from Jobs within dynamic pipelines. The Work Items endpoint accepts a HTTP POST request with a JSON payload that contains search information using the same model as the Job endpoint.

The payload can be sent to the endpoint as follows:

curl --request POST \
  --url https://api.platform.datastreamer.io/api/pipelines/work-items/search \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'apikey: *'
  --data '{
  "query": {
    "sort": [
      {
        "field": "updated",
        "order": "DESC"
       }
    ],
    "from": 0,
    "size": 100,
    "track_total_hits": false,
    "query": "job_id:574930t5-b7af-4601-bfe4-47d77997fe8d"
  }
}'

Results

The API will return a JSON response containing details of the work items, such as job IDs, statuses, and timestamps. e.g:-

{
    "start": 0,
    "count": 1,
    "records": [
        {
            "type": "work_item",
            "organization_id": "xxxxxx",
            "pipeline_id": "xxxxxx",
            "step_id": "xxxxxx",
            "work_item_id": "5thus54p-3999-4b75-83dc-dc16a24fe166",
            "job_id": "574930t5-b7af-4601-bfe4-47d77997fe8d",
            "data_source": "brightdata_youtube_posts",
            "provider_task_id": null,
            "parent_work_item_id": null,
            "state": "completed",
            "preferred_start_time": "2024-12-13T12:55:41.692496Z",
            "document_count": 99,
            "max_documents": 100,
            "failure_count": 0,
            "created": "2024-12-13T12:50:26.665749Z",
            "updated": "2024-12-13T12:55:41.692496Z",
            "actioned": "2024-12-13T12:55:40.690468Z",
            "action_id": "8857dbd9-0c36-4b36-b46e-bd100b1255e7",
            "query_from": "2024-12-01T00:00:00Z",
            "query_to": "2024-12-12T00:00:00Z",
            "query": {
                "search_type": "keywords",
                "keywords": [
                    "Periodic Table"
                ]
            },
            "label": "Periodic-Table-Project-X"
        },
        ...
    ]
}

Lucene Search

Lucene queries are described on this page: Apache Lucene - Query Parser Syntax

Keywords such as AND, OR, NOT and TO must be in upper case.

Not all features of Lucene are supported (such as boosting and ~).

The fields available for searching are

FieldTypeDescription
pipeline_idstringMatches based on the job pipeline ID
step_idstringMatches based on the job step ID within the pipeline (when you have multiple ingresses in your pipeline
job_idstringMatches based on the job id
job_namestringMatches based on the job name
work_item_idstringMatches based on the job work_item_id that is available once the job is first scheduled
data_sourcestringMatches jobs for the specified data source (e.g. wsl_instagram)
job_typestringEither OneTime or Periodic (this field is not case sensitive)
state
or
status
stringMatches any one of the following values

NotReady
Ready
Scheduled
Running
Succeeded
Failed
Deleted
Disabled

(this field is not case sensitive)
labelstringMatch based on the user input job label
tagsstringMatch based on the use tags. This is a wildcard search by default
createddateMatch based on the date the job was created
updateddateMatch based on the date the job was last updated data
preferred_start_timedateMatch based on the date the job was scheduled to start
query_fromdateMatch based on the start of the date range for the query
query_todateMatch based on the end of the date range for the query
queryJSONMatch based on the JSON representation of the query. Exact details depend on the data source.
max_documentsintegerMatch based on the maximum number of documents a single execution of the job can return
document_countintegerMatch based on the cumulative total of documents found when executing this job

Notes:

  1. If no field is specified, then query, label and tags will be searched.
  2. All string fields can match on the wildcard character * .
  3. All string fields can be quoted with "..." for a match on the entire string without wildcards
  4. * can be escaped with \ in an unquoted string if you need to find the * character.
  5. null Can be used to find jobs where the value is not set. You can also use the negation character ! to find non-null values, e.g: label: !null

Date Format

Dates can be entered as follows:

  1. UTC time in ISO8601 Date/Time format (YYYY-MM-DDTHH:MM:SS)

  2. Date in ISO8601 Date format (YYYY-MM-DD)

  3. Offset expression: [+|-]<number><unit> (e.g. -1h).

    ValueUnit
    sSeconds
    miMinutes
    hHours
    dDays
    wWeeks
    mMonths
    yYear

  4. Fixed expression

    ValueDescription
    nowThe current date
    noonThe current evaluated date at 12:00:00
    midnightThe current evaluated date at 00:00:00
    eodThe current evaluated date at 1 tick before midnight
    fomMidnight on the first of the current evaluated month.
    eomThe current evaluated date set 1 ticket before midnight on the last day of the month
    sow00:00:00 on the preceding Sunday for the current evaluated date.
    eow1 ticket before midnight on the next Saturday after the current evaluated date.
    todaySets the current evaluated date to today's date. Time of day is unchanged
    yesterdaySets the current evaluated date to yesterday’s date. Time of day is unchanged
    tomorrowSets the current evaluated date to tomorrow’s date. Time of day is unchanged
    at “hh:mm”Sets the current time to hh:mm

Examples

  1. state: succeeded AND updated: [-3m to -1m] Return all successfully jobs completed between 1 and 3 months ago.
  2. cat OR dog AND data_source: wsl_instagram Return all jobs searching for cat or dog on the wsl_instagram data source.
  3. label: tiger AND created: [fom-1m TO eom-1m] Return all jobs with label tiger created in the previous calendar month
  4. fish AND chips AND [sow-1w TO eow-1w] Return all jobs searching for fish and chips for the previous week (Sunday 00:00.00 to Saturday 23:59:59.999)