Monitored Search API

Monitored Search & Webhook

Monitored Search is perfect for creating monitoring features and products. This API allows you to set searches for arbitrary text strings, search with complex boolean logic, use filters, and other advanced features. Results, as they appear, are then returned as ordinary JSON documents to your specified endpoint.

Background & Prerequisite

  • You must have a valid ApiKey from DataStreamer
  • It is helpful to be familiar with our Search API
  • You will need a publicly accessible RESTful endpoint

๐Ÿ“˜

Important Tip

You can use tools like ngrok to expose your local endpoint to our webhook during the development phase. We used a ngrok.io example in the code examples below.

๐Ÿ“˜

Compatible Sources

Monitored Search is available for Stream Integrated sources and enrichments only.

Endpoint Details

EndpointTypeDescription
Get WebhookGETThis endpoint retrieves the available webhooks for the developers consumption.
Set WebhookPUTIn order to setup new webhook, the developer can provide relevant webhook parameters in the body and set it up accordingly.
Get Webhook StatusGETRetrives the webhook status configured against the Datastreamer API.
Set Webhook StatusPUTThis endpoint is used to enable or disable the webhook within the Datastreamer API.
Send Test Payload to WebhookPOSTThis enables the developer to test out a payload to webhook for testing purposes.

Getting Started

1. Set your Webhook

Replace #ApiKey# using your Datastreamer ApiKey, making sure the url is correct. If your end point requires special headers for your own processing, they can be added under headers.

curl --location --request PUT 'https://api.platform.datastreamer.io/api/webhook' \ 
     --header 'apiKey: #ApiKey#' \ 
     --header 'Content-Type: application/json' \ 
     --data-raw '{     
				"status": true,     
				"url": "https://35c0-184-148-13-151.ngrok.io/api/test",     
			  "method": "POST",     
				"signature_required": false,     
				"signature_secret": "",     
				"status_code_check": "200",     
				"headers": {         
					"apiKey": "someApiKey",         
					"someExtraHeader": "12345",         
					"someExtraHeader2": "bbb"     
				} 
		}'

If saved successfully, you should get 200 as status and a response like the following:

{
    "webhook": {
        "status": true,
        "url": "https://35c0-184-148-13-151.ngrok.io/api/test",
        "method": "POST",
        "signature_required": false,
        "signature_secret": "",
        "status_code_check": "200",
        "headers": {
            "apiKey": "someApiKey",
            "someExtraHeader": "12345",
            "someExtraHeader2": "bbb"
        }
    },
    "errors": null
}

2. Test your Webhook

Use the following command to check if your webhook is working.

(Replace #ApiKey# with your Datastreamer provided key)

curl --location --request POST 'https://api.platform.datastreamer.io/api/webhook/test' \
--header 'apiKey: #ApiKey#'

The response will be similar to the below:

{
    "status_code": 200,
    "response": null,
    "error_message": "Some text"
}

You should see your endpoint receiving some results.

      headers:
      Content-Type=application/json; charset=utf-8
      Accept-Encoding=gzip
      Host=35c0-184-148-13-151.ngrok.io
      traceparent=00-9374828738f68a4995aa3f5f9fcae8f6-4a6adb223e24964e-00
      Content-Length=1098
      Apikey=someApiKey
      Someextraheader=12345
      Someextraheader2=bbb
      X-Forwarded-For=35.239.102.169
      X-Forwarded-Proto=https
      body:
      {"Source":"Percolation","Event":"Your Test Monitored Search","Data":[{"id":"1634393989898500400-artemis","internal":{"provider_document_id":"1634393989898500400-artemis","last_updated":"2022-08-03T13:32:20.1818086Z"},"data_source":"wsl_twitter","source":{"link":"https://twitter.com/TheTJHelm/status/1449378030141550592"},"content":{"body":"@drvolts To be 100% honest I blame Obama. This should have been done in 2009 with 60 Democratic senators","found":"2021-10-16T14:19:49Z","published":"2021-10-16T14:13:45Z","favorites":34647,"followers":3133,"following":4976,"mentions":["drvolts"]},"author":{"name":"TJ Helmstetter","bio":"<p>Progressive communicator. Views own etc etc.</p>","location":"Washington, DC","profile_image_source":"https://pbs.twimg.com/profile_images/1147277618003230721/d28piXtp_normal.jpg","gender":"UNKNOWN","url":"https://twitter.com/TheTJHelm","handle":"TheTJHelm"},"enrichment":{"sentiment":"NEUTRAL","language":"en"},"twitter":{"tweet_type":"POST","retweet_type":"NONE","post_identifier":"1449378030141550592","user_verified":"False","user_id":"41660626"}}]}

Congratulations! You Webhook is up and running now.

3. Create your first Monitored Search

Now itโ€™s time to create your first Monitored Search.

๐Ÿ“˜

Creating the query

The โ€œsearch_queryโ€ object within Monitored Search follows the same syntax as Search-API. As only new results are processed through Monitored Search, it is recommended to avoid date filters.

curl --location --request POST 'https://api.platform.datastreamer.io/api/monitored-search' \
--header 'apiKey: #ApiKey#' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "my-first-monitored-search",
    "created_by": "me",
    "search_query":{
        "data_sources": ["wsl_twitter"],
        "query": "content.body: Queen",
        "size": 100
    }
}'

And you are expecting to get results like below:

{
    "monitor_id": "1140f158-0907-48ef-87c6-0e810e2c5252",
    "client": "datastreamer",
    "created_on": "2022-09-20T15:50:04.3414056Z",
    "name": "my-first-monitored-search",
    "search_query": {
        "size": 100,
        "query": "content.body: Queen",
        "data_sources": [
            "wsl_twitter"
        ],
        "highlight_pre_text": "<highlight>",
        "highlight_post_text": "</highlight>"
    }
}

Save the monitor_id property somewhere; weโ€™ll need that later.

1. Observe the results.

Your endpoint should regularly receive all matching live content.

2. Delete the monitored search

Using the following command to delete the monitored search once you are done. (Replacing the #YourMonitorId# with the id we saved at the end of step 3 and #ApiKey# with your Datastreamer ApiKey)

curl --location --request DELETE 'https://api.platform.datastreamer.io/api/monitored-search/#YourMonitorId#' \
--header 'apiKey: #ApiKey#'

It might take a minute to two for the incoming notifications to stop - there are many documents already in the pipeline probably.

It has to be done with the ID, not the name of the monitored search

Other operations

Get a monitored search

Use the following command to view your Monitored Search (replace #ApiKey# with your Datastreamer ApiKey and #MonitoredSearchName# with the name of the search):

curl --location --request GET 'https://api.platform.datastreamer.io/api/monitored-search/#MonitoredSearchName#' \
--header 'apiKey: #ApiKey#'

You shall see the definition of that given monitored search, something similar like:

{
    "monitor_id": "6fa07101-e7e2-4165-9dd0-239d193e660f",
    "client": "datastreamer",
    "created_on": "2022-09-20T16:02:59.261918",
    "name": "my-first-monitored-search",
    "search_query": {
        "size": 100,
        "query": "content.body: Queen",
        "data_sources": [
            "wsl_twitter"
        ],
        "highlight_pre_text": "<highlight>",
        "highlight_post_text": "</highlight>"
    },
    "status": "Active"
}

Update monitored search

You can update your monitored search using the exact same syntax as step 3, as long as the name is the same it shall overwrite the older definition (though it takes a minute or two to take effect, same as delete).

Enable/Disable your Webhook

You can get the status of your Webhook via the following command:

curl --location --request GET 'https://api.platform.datastreamer.io/api/webhook/status' \
--header 'apiKey: #ApiKey#'

and similarly, you can enable/disable your webhook via following command (donโ€™t forget to pick one as enabled value, not both):

curl --location --request PUT 'https://api.platform.datastreamer.io/api/webhook/status' \
--header 'apiKey: #ApiKey#' \
--header 'Content-Type: application/json' \
--data-raw '{
    "enabled": true|false
}'

High Volume Query Warning:

Upon the creation of a monitored search, a historical averages of the last 7 days will occur. If the daily expected amount is in excess of 150,000 per day (measured per source individually); a warning will be returned.

This query is considered a high volume query, and a warning has been provided.
To continue please add the following preceding or after your "search_query" section:
"high_volume_acknowledgement": true.

[source] warning exceeded XXXX > 150000

This warning is to notify developers of potentially large result sets, and is provided for your convenience. The estimate is based on the average of the last 7 days, and while it is not a prediction of the future, it's a good indication that volume will be high moving forward.

To accept and bypass the warning, add "high_volume_acknowledgement": "true" either before or after the "search_query" section. An example is available below:

{
  "search_query": {
    "query": "content.body: cats",
    "data_sources": [
      "wsl_twitter"
    ],
  },
  "name": "test",
  "high_volume_acknowledgement": "true"
}

Notes & Tips:

  • The โ€œsearch_queryโ€ object within monitored search follows exactly the same syntax as search-api, though it does not make sense to put a very old date range to the query as monitored search (only live data flow through it).
  • You can combine the data sources if the query is the same, your request body would be like:
{
    "name": "your-3-in-1-search",
    "created_by": "test",
    "search_query":{
        "data_sources": ["wsl_twitter", "opoint_news", "twingly_blogs"],
        "query": "content.body: Queen",
        "size": 100
    }
}

If you are concerned about integrity of the incoming call/data, beside api-key, you can also enable the signature and specify your own secret when you define your own webhook (step 1). As result, you should get an extra header named โ€œdatastreamer-signatureโ€ and its value contains a HMACSHA256 hash of the whole body with the secret you provided.

  • You can also change if your endpoint is expecting a POST or PUT, theoretically you can use other verb, but given the length of the content, you might want to be careful on extra restrictions the gateways/routes might have along the way.
  • You can also add extra check on webhook on what status code you will be returning and put it into status_code_check field, leave it empty or null if you donโ€™t care.