Monitored Search API
Monitored Search & Webhook
Monitored Search is perfect for creating monitoring features and products. This API allows you to set searches for arbitrary text strings, search with complex boolean logic, use filters, and other advanced features. Results, as they appear, are then returned as ordinary JSON documents to your specified endpoint.
Background & Prerequisite
- You must have a valid ApiKey from DataStreamer
- It is helpful to be familiar with our Search API
- You will need a publicly accessible RESTful endpoint
Important Tip
You can use tools like ngrok to expose your local endpoint to our webhook during the development phase. We used a ngrok.io example in the code examples below.
Compatible Sources
Monitored Search is available for Stream Integrated sources and enrichments only.
Endpoint Details
Endpoint | Type | Description |
---|---|---|
Get Webhook | GET | This endpoint retrieves the available webhooks for the developers consumption. |
Set Webhook | PUT | In order to setup new webhook, the developer can provide relevant webhook parameters in the body and set it up accordingly. |
Get Webhook Status | GET | Retrives the webhook status configured against the Datastreamer API. |
Set Webhook Status | PUT | This endpoint is used to enable or disable the webhook within the Datastreamer API. |
Send Test Payload to Webhook | POST | This enables the developer to test out a payload to webhook for testing purposes. |
Getting Started
1. Set your Webhook
Replace #ApiKey# using your Datastreamer ApiKey, making sure the url is correct. If your end point requires special headers for your own processing, they can be added under headers.
curl --location --request PUT 'https://api.platform.datastreamer.io/api/webhook' \
--header 'apiKey: #ApiKey#' \
--header 'Content-Type: application/json' \
--data-raw '{
"status": true,
"url": "https://35c0-184-148-13-151.ngrok.io/api/test",
"method": "POST",
"signature_required": false,
"signature_secret": "",
"status_code_check": "200",
"headers": {
"apiKey": "someApiKey",
"someExtraHeader": "12345",
"someExtraHeader2": "bbb"
}
}'
If saved successfully, you should get 200 as status and a response like the following:
{
"webhook": {
"status": true,
"url": "https://35c0-184-148-13-151.ngrok.io/api/test",
"method": "POST",
"signature_required": false,
"signature_secret": "",
"status_code_check": "200",
"headers": {
"apiKey": "someApiKey",
"someExtraHeader": "12345",
"someExtraHeader2": "bbb"
}
},
"errors": null
}
2. Test your Webhook
Use the following command to check if your webhook is working.
(Replace #ApiKey# with your Datastreamer provided key)
curl --location --request POST 'https://api.platform.datastreamer.io/api/webhook/test' \
--header 'apiKey: #ApiKey#'
The response will be similar to the below:
{
"status_code": 200,
"response": null,
"error_message": "Some text"
}
You should see your endpoint receiving some results.
headers:
Content-Type=application/json; charset=utf-8
Accept-Encoding=gzip
Host=35c0-184-148-13-151.ngrok.io
traceparent=00-9374828738f68a4995aa3f5f9fcae8f6-4a6adb223e24964e-00
Content-Length=1098
Apikey=someApiKey
Someextraheader=12345
Someextraheader2=bbb
X-Forwarded-For=35.239.102.169
X-Forwarded-Proto=https
body:
{"Source":"Percolation","Event":"Your Test Monitored Search","Data":[{"id":"1634393989898500400-artemis","internal":{"provider_document_id":"1634393989898500400-artemis","last_updated":"2022-08-03T13:32:20.1818086Z"},"data_source":"wsl_twitter","source":{"link":"https://twitter.com/TheTJHelm/status/1449378030141550592"},"content":{"body":"@drvolts To be 100% honest I blame Obama. This should have been done in 2009 with 60 Democratic senators","found":"2021-10-16T14:19:49Z","published":"2021-10-16T14:13:45Z","favorites":34647,"followers":3133,"following":4976,"mentions":["drvolts"]},"author":{"name":"TJ Helmstetter","bio":"<p>Progressive communicator. Views own etc etc.</p>","location":"Washington, DC","profile_image_source":"https://pbs.twimg.com/profile_images/1147277618003230721/d28piXtp_normal.jpg","gender":"UNKNOWN","url":"https://twitter.com/TheTJHelm","handle":"TheTJHelm"},"enrichment":{"sentiment":"NEUTRAL","language":"en"},"twitter":{"tweet_type":"POST","retweet_type":"NONE","post_identifier":"1449378030141550592","user_verified":"False","user_id":"41660626"}}]}
Congratulations! You Webhook is up and running now.
3. Create your first Monitored Search
Now it’s time to create your first Monitored Search.
Creating the query
The “search_query” object within Monitored Search follows the same syntax as Search-API. As only new results are processed through Monitored Search, it is recommended to avoid date filters.
curl --location --request POST 'https://api.platform.datastreamer.io/api/monitored-search' \
--header 'apiKey: #ApiKey#' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "my-first-monitored-search",
"created_by": "me",
"search_query":{
"data_sources": ["wsl_twitter"],
"query": "content.body: Queen",
"size": 100
}
}'
And you are expecting to get results like below:
{
"monitor_id": "1140f158-0907-48ef-87c6-0e810e2c5252",
"client": "datastreamer",
"created_on": "2022-09-20T15:50:04.3414056Z",
"name": "my-first-monitored-search",
"search_query": {
"size": 100,
"query": "content.body: Queen",
"data_sources": [
"wsl_twitter"
],
"highlight_pre_text": "<highlight>",
"highlight_post_text": "</highlight>"
}
}
Save the monitor_id property somewhere; we’ll need that later.
1. Observe the results.
Your endpoint should regularly receive all matching live content.
2. Delete the monitored search
Using the following command to delete the monitored search once you are done. (Replacing the #YourMonitorId# with the id we saved at the end of step 3 and #ApiKey# with your Datastreamer ApiKey)
curl --location --request DELETE 'https://api.platform.datastreamer.io/api/monitored-search/#YourMonitorId#' \
--header 'apiKey: #ApiKey#'
It might take a minute to two for the incoming notifications to stop - there are many documents already in the pipeline probably.
It has to be done with the ID, not the name of the monitored search
Other operations
Get a monitored search
Use the following command to view your Monitored Search (replace #ApiKey# with your Datastreamer ApiKey and #MonitoredSearchName# with the name of the search):
curl --location --request GET 'https://api.platform.datastreamer.io/api/monitored-search/#MonitoredSearchName#' \
--header 'apiKey: #ApiKey#'
You shall see the definition of that given monitored search, something similar like:
{
"monitor_id": "6fa07101-e7e2-4165-9dd0-239d193e660f",
"client": "datastreamer",
"created_on": "2022-09-20T16:02:59.261918",
"name": "my-first-monitored-search",
"search_query": {
"size": 100,
"query": "content.body: Queen",
"data_sources": [
"wsl_twitter"
],
"highlight_pre_text": "<highlight>",
"highlight_post_text": "</highlight>"
},
"status": "Active"
}
Update monitored search
You can update your monitored search using the exact same syntax as step 3, as long as the name is the same it shall overwrite the older definition (though it takes a minute or two to take effect, same as delete).
Enable/Disable your Webhook
You can get the status of your Webhook via the following command:
curl --location --request GET 'https://api.platform.datastreamer.io/api/webhook/status' \
--header 'apiKey: #ApiKey#'
and similarly, you can enable/disable your webhook via following command (don’t forget to pick one as enabled value, not both):
curl --location --request PUT 'https://api.platform.datastreamer.io/api/webhook/status' \
--header 'apiKey: #ApiKey#' \
--header 'Content-Type: application/json' \
--data-raw '{
"enabled": true|false
}'
High Volume Query Warning:
Upon the creation of a monitored search, a historical averages of the last 7 days will occur. If the daily expected amount is in excess of 150,000 per day (measured per source individually); a warning will be returned.
This query is considered a high volume query, and a warning has been provided.
To continue please add the following preceding or after your "search_query" section:
"high_volume_acknowledgement": true.
[source] warning exceeded XXXX > 150000
This warning is to notify developers of potentially large result sets, and is provided for your convenience. The estimate is based on the average of the last 7 days, and while it is not a prediction of the future, it's a good indication that volume will be high moving forward.
To accept and bypass the warning, add "high_volume_acknowledgement": "true" either before or after the "search_query" section. An example is available below:
{
"search_query": {
"query": "content.body: cats",
"data_sources": [
"wsl_twitter"
],
},
"name": "test",
"high_volume_acknowledgement": "true"
}
Notes & Tips:
- The “search_query” object within monitored search follows exactly the same syntax as search-api, though it does not make sense to put a very old date range to the query as monitored search (only live data flow through it).
- You can combine the data sources if the query is the same, your request body would be like:
{
"name": "your-3-in-1-search",
"created_by": "test",
"search_query":{
"data_sources": ["wsl_twitter", "opoint_news", "twingly_blogs"],
"query": "content.body: Queen",
"size": 100
}
}
If you are concerned about integrity of the incoming call/data, beside api-key, you can also enable the signature and specify your own secret when you define your own webhook (step 1). As result, you should get an extra header named “datastreamer-signature” and its value contains a HMACSHA256 hash of the whole body with the secret you provided.
- You can also change if your endpoint is expecting a POST or PUT, theoretically you can use other verb, but given the length of the content, you might want to be careful on extra restrictions the gateways/routes might have along the way.
- You can also add extra check on webhook on what status code you will be returning and put it into status_code_check field, leave it empty or null if you don’t care.
Updated 9 months ago