Socialgist Jobs (API)

This documentation covers best practices and how to setup and use Socialgist sources.

🚧

Legacy Functionality

This Socialgist Jobs system is being replaced by the Job Management systems and APIs. The updated Job Management system is common to all Job systems and no longer requires a data-source-specific solution.

The information on this page, is available to support existing implementations, but new implementations are suggested to use the Job Management section. Checkout Job Management section for Socialgist

Socialgist Collection Jobs

1. Getting Started

To get started with Socialgist, you will first need to have an access token from Socialgist (please contact us if you don’t already have one).

NOTE: for the cURL examples below, we've injected blogs as a possible DataSourceName. Feel free to use any of the available Socialgist sources. See Step 2.2 to list all available sources. List Socialgist Data Sources

2. Register your Socialgist Token with Datastreamer.

📘

Full documentation:

Full documentation is available here: https://docs.datastreamer.io/docs/connecting-compatible-sources. A snippet is listed below.

2.1. Register your Token.

curl --location --request PUT 'https://api.platform.datastreamer.io/api/data-sources/socialgist' \
--header 'Content-Type: application/json' \
--header 'apikey: {YourDatastreamerApiKey}' \
--data '{ 
    "token": "{YourSocialgistAccessToken}"
}'

2.2. List available Socialgist Data Sources

curl --location 'https://api.platform.datastreamer.io/api/data-providers/socialgist' \
--header 'apikey: {YourDatastreamerApiKey}'

Using the above command you can see all the current supported sub-data sources from Socialgist.

3. Create your tasks

Creating your tasks for Socialgist collection involves specifying the source in your URL, and needed data within the body.

You can use the following command to create your first batch of tasks.

curl --location 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs' \
--header 'Content-Type: application/json' \
--header 'apikey: {YourDatastreamerApiKey}' \
--data '{
    "tasks": [
        {
            "value": "cats",
            "from_date": "2024-01-01",
            "to_date": "2024-01-05",
            "update_interval": 86400,
            "parameters": {}
        }
    ]
}'

4. Listing Your Tasks

You can list all the tasks under a certain source (putting the data source name in the URL)

curl --location 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs' \
--header 'apikey: {YourDatastreamerApiKey}'

You can also filter tasks by date or status and define the number of records to be returned.

curl --location 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs?start=1&count=200&from=2023-10-01T00:00:00Z&to=2023-12-05T23:59:59Z&status=started' \
--header 'apikey: {YourDatastreamerApiKey}'

5. Start/Stop and Check tasks

Once you have the list of tasks, you need to start them in order to get the data. You can either start all of them (put the correct data source name in the URL)

a. Start All

curl --location --request PUT 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs/start-all' \
--header 'Content-Type: application/json' \
--header 'apikey: {YourDatastreamerApiKey}'

b. Start a Single task

Start an individual task (using the TaskId from listing or creation)

curl --location --request PUT 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs/{TaskId}/start' \
--header 'apikey: {YourDatastreamerApiKey}'

c. Cancel All Tasks

Similarly, you can cancel the tasks you started either in batch

curl --location --request POST 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs/cancel-all' \
--header 'apikey: {YourDatastreamerApiKey}'

d. Cancel a Single Task

or individually

curl --location --request POST 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs/{TaskId}/cancel' \
--header 'apikey: {YourDatastreamerApiKey}'

e. Check Task Status

Usually, the task takes 10-30 minutes to get executed, and you can check the status of the tasks via

curl --location 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs/{TaskId}' \
--header 'apikey: {YourDatastreamerApiKey}'

Once the status becomes “complete” or “finished” your data is ready to be consumed.

f. Remove a task

Where if you want to remove a task, use the following, putting the correct data source name and task id in URL):

curl --location --request DELETE 'https://api.platform.datastreamer.io/api/data-providers/socialgist/blogs/{TaskId}' \
--header 'apikey: {YourDatastreamerApiKey}'

6. Consume the data you have received

Once your tasks are completed, the data are ready to be consumed via normal Datastreamer search/count APIs. For example, the following query will be searching in the data you have in socialgist_blogs. (You’ll need to add socialgist_ before the name of the data source).

Note that it is important to include a date range in your query for doc_date or content.published fields, the default date range if not included is past 30 days by default, so it's recommended to include a date range in your query. See example below for retrieving documents using a date range filter on content.published field.

Even though there is no organization on the data source name, the search API implementation ensures it hits the proper index.

curl --location 'https://api.platform.datastreamer.io/api/search?keep_alive_seconds=60' \
--header 'Content-Type: application/json' \
--header 'apikey: {YourDatastreamerApiKey}' \
--data '{
    "query": {
        "from":0,
        "size": 100,
        "query": "cats AND content.published: [2024-01-01 TO 2024-01-15]",
        "data_sources": ["socialgist_blogs"]               
    }
}'

👍

You are all set!

If you want to explore with a sample source before diving into production usage, try out these sample sources: https://docs.datastreamer.io/docs/sample-sources.