Extraction API

Background & Prerequisite

Before conducting an Extraction using the Extraction API, it is suggested to use the Search API to test and optimize your queries before conducting an Extraction.

SearchAPI: Use the SearchAPI to ensure that the results over a shorter timeframe resolve and provide the desired response.
CountAPI: The billing for Extraction is based on the volume of extracted content. Using the Count API allows you to see the total number of results

🚧
Available Use
Extraction is only available for Stream Integrated partner sources or your own data sources that use Datastreamer for storage.

Endpoint Details

Endpoint	Type	Description	Status
/extraction/query	`POST`	Endpoint for Extraction API enables developers to retrieve the total results based on the provided parameters in a compressed file.	Live
/extraction/jobs	`GET`	Lists all the jobs in progress, failed or completed in the last 30 days.	Live
/extraction/jobs/{jobId}/progress	`GET`	View the progress of any specific job.	Live
/extraction/jobs/{jobId}/cancel	`POST`	Cancel the extraction job.	Live

Request Object Details

A sample request body looks like this:

{
  "query": {
    "query": "New York",
    "data_sources": [
         "wsl_twitter"
    ]
  }
}

Data Source and Query are extensively used within the JSON request body to filter out the results as per the requirements. Apache Lucene based syntax is used to write down the queries within the query parameter of the request body. The format and content of the query are the same as the Search API format.

📘
Ignored Fields
As "from" and "size" are handled automatically. These fields are ignored if present in the request.

Response Details

Based on the data_source provided in the request body and the query; the response of the Extraction API is the total resulting documents available at the sources provided.

Unlike other APIs, Extraction API places the results in compressed files on accessible storage. You will be able to access the results in zipped JSON format, and they will be present under the "job ID" (given upon launching the query).

🚧
Bucket Accessibility
If you do not have a Cloud Storage Bucket established for you. Conducting an extraction job will create one. It is suggested to reach out to Datastreamer to setup a Cloud Storage Bucket prior to conducting any Extraction jobs.

Possible job status

Unknown - default state - you should not see this
InProgress - job is currently being executed
Complete - job has completed successfully
Failed - job has failed after retry
NotStarted - job has yet to be executed
Abandoned - job has been cancelled

Extraction API

Background & Prerequisite

🚧
Available Use

Endpoint Details

Request Object Details

📘
Ignored Fields

Response Details

🚧
Bucket Accessibility

Possible job status

Background & Prerequisite

🚧Available Use

Endpoint Details

Request Object Details

📘Ignored Fields

Response Details

🚧Bucket Accessibility

Possible job status

🚧
Available Use

📘
Ignored Fields

🚧
Bucket Accessibility