Connect Streaming Data Sources

Datastreamer unlocks the ability to seamlessly and quickly add new data sources into your data pipeline. Adding new sources follows the below process.

Go-Live Checklist

To register a streaming source:

  • Have a streaming data source able to deliver to either: AWS S3 bucket or Webhook
  • Upload a schema
  • Connect your source

Schema Mapping

Datastreamer classifiers and other sources are mapped to use Datastreamer's published metadata fields. While not required, taking advantage of the other features and functionality of the Datastreamer platform is recommended.

๐Ÿ“˜

Metadata Field Template

You can use the following Google Doc as a metadata field template to help develop your schema.Metadata Matching Schema Template

To create the schema, you need to specify the source metadata field (source_path), destination metadata fields (destination_path), and data type (string, date, etc.). Here is an example of a schema that shows a schema named Datas having field mapping. The source_path from the original gets mapped to destination_path i.e., the Datastreamer data schema along with a data type.

{
  "schema": {
    "name": "Datas",
    "mappings": [
      {
        "source_path": "thread.uuid",
        "destination_path": "id",
        "type": "string"
      },
      {
        "source_path": "thread.published",
        "destination_path": "doc_date",
        "type": "date"
      },
      {
        "source_path": "thread.published",
        "destination_path": "content.published",
        "type": "date"
      },
      {
        "source_path": "thread.url",
        "destination_path": "source.link",
        "type": "string"
      },
      {
        "source_path": "thread.title",
        "destination_path": "content.title",
        "type": "string"
      },
      {
        "source_path": "text",
        "destination_path": "content.body",
        "type": "string"
      }
    ],
    "schema": {
      "nbd": {
        "id": "string",
        "doc_date": "date",
        "source": {
          "link": "string"
        },
        "content": {
          "body": "string",
          "title": "string",
          "published": "string",
        },
      }
    }
  }
}

Submitting a Schema

Please view the examples and guides available in the API reference to submit, validate, modify, or delete a schema. https://datastreamer.readme.io/reference/post_api-schemas

Connecting your Streaming Datasource

Please reach out to your Data Consultant to integrate streaming data.

Using the Data in Datastreamer

Once the schema is validated and the streaming data source is successfully connected to the Datastreamer, the data will be available in the Datastreamer pipeline for usage. Utilize the Datastreamer APIs and metadata fields defined in your schema to begin using the integrated data within your application.