X/Twitter Retweets Collector

Fetches list of users that retweeted a post and enriches incoming JSON documents with the collected data

Common uses:

  • Enriching social media content with retweet users information
  • Improves the accuracy of AI operations in the document
🚧

Important

This component must be used on documents produced by the Unify Transformer component. In other words, it can be added in the pipeline only after Unify Transformer component.

Component Configuration

You can add and configure X/Twitter Retweets Collector in your pipeline by adding the component as an operation.

Message Id JSON Path

Specifies the JSON path in your source document where the message identifier (such as a X/Twitter post ID) is located. The component will use this ID to retrieve corresponding post retweets list from another source.

For example, if the incoming document contains:

{
  "data": {
    "documents": [
      {
				...
        "twitter": {
          "post_identifier": "1234567890"
        }
				...
      },
      {
				...
        "twitter": {
          "post_identifier": "9876543210"
        }
				...
      }
    ]
  }
}

And you set the Message Id JSON Path to twitter.post_identifier, the component will use the information found there to go and collect additional data.

The incoming document fields definition is the Datastreamer Default Schema.

Max Returned Items

The maximum number or retweet users to return. The value can be changed from 100 to 500 in 100 increments.

🚧

Important

The usage cost increase proportionally with the number of users to be returned.

Message Mappings

This setting defines how message data is added/merged into your documents.

This setting defines how data is copied from a source JSON document into a destination JSON document.

  • The source JSON document schema depend on the service used by the collector to get the additional data.
  • The destination JSON document schema is the Datastreamer Default Schema.

Each mapping rule contains three fields:

  • type: The data type of the value (string, integer, boolean, date)
  • source_path: The field name in the message response data
  • destination_path: Where to place the data in your document

Example Mapping

Given the mapping:

{
  "mappings": [
    {
      "source_path": "repostUsers[*].screen_name",
      "destination_path": "twitter.retweet_authors[*].handle",
      "type": "string"
    },
    {
      "source_path": "repostUsers[*].followers_count",
      "destination_path": "twitter.retweet_authors[*].followers",
      "type": "integer"
    }
  ]
}

If the component returns message data like:

{
  "repostUsers": [
    {
        "screen_name": "example_user_1",
        ...
        "followers_count": 1000,
        ...
    },
    {
        "screen_name": "example_user_2",
        ...
        "followers_count": 2000,
        ...
    },
		...
	]
}

And your original document is:

{
  "source": {
    "link": "https://x.com/elonmusk/status/2041754402239975479"
  },
  "content": {
    "body": "some document body",
  },
  "twitter": {
    "post_identifier": "2041774854588842252"
  }
}

After processing with the mapping above, your document becomes:

{
  "source": {
    "link": "https://x.com/elonmusk/status/2041754402239975479"
  },
  "content": {
    "body": "some document body"
  },
  "twitter": {
    "post_identifier": "2041774854588842252",
    "retweet_authors": [
      {
        "handle": "example_user_1",
        "followers": 1000
      },
      {
        "handle": "example_user_2",
        "followers": 2000
      }
		]
  }
}

Current Mapping

{
	"source_path": "repostUsers[*].description",
  "destination_path": "twitter.retweet_authors[*].bio",
  "type": "string"
},
{
  "source_path": "repostUsers[*].followers_count",
  "destination_path": "twitter.retweet_authors[*].followers",
  "type": "integer"
},
{
  "source_path": "repostUsers[*].friends_count",
  "destination_path": "twitter.retweet_authors[*].following",
  "type": "integer"
},
{
  "source_path": "repostUsers[*].screen_name",
  "destination_path": "twitter.retweet_authors[*].handle",
  "type": "string"
},
{
  "source_path": "repostUsers[*].favourites_count",
  "destination_path": "twitter.retweet_authors[*].likes_count",
  "type": "integer"
},
{
  "source_path": "repostUsers[*].location",
  "destination_path": "twitter.retweet_authors[*].location",
  "type": "string"
},
{
  "source_path": "repostUsers[*].name",
  "destination_path": "twitter.retweet_authors[*].name",
  "type": "string"
},
{
  "source_path": "repostUsers[*].statuses_count",
  "destination_path": "twitter.retweet_authors[*].posts_count",
  "type": "integer"
},
{
  "source_path": "repostUsers[*].created_at",
  "format": "ddd MMM dd HH:mm:ss zzz yyyy",
  "destination_path": "twitter.retweet_authors[*].profile_create_date",
  "type": "date"
},
{
  "source_path": "repostUsers[*].profile_image_url_https",
  "destination_path": "twitter.retweet_authors[*].profile_image_source",
  "type": "string"
},
{
  "source_path": "repostUsers[*].protected",
  "destination_path": "twitter.retweet_authors[*].protected",
  "type": "boolean"
},
{
  "source_path": "repostUsers[*].verified",
  "destination_path": "twitter.retweet_authors[*].verified",
  "type": "boolean"
}