Amazon S3 Storage Egress Connector

Pipeline Egress to Amazon S3 Storage buckets

Component Configuration

You can use Amazon S3 Storage Egress for your Pipeline outputs by adding this component at the end of the Pipeline. It is recommended to use a transform operation (JSON Transform or Unify Transform) to standardize the fields before this component.

Collation Type

It is recommended to use File Collation (default). This groups documents into files for the job. Alternative options are to collate based on messages (internal process for managing requests into manageable units for pipeline processing) or individual files for each document received.

Bucket/Container (required)

Specify the Amazon S3 storage bucket name for egress.

Use Metadata Tag (Optional)

Specify the Metadata Tag "name" to be used for the output folder in the bucket. The Tag "value" is configured as part of job creation. See Creating Jobs (Portal, API). If the Tag is not present on the document/file received by the Amazon S3 Egress component the Metadata Tag value will be used by default as the folder name.

Collation Size

Integer (bytes) specifying the collation size of the output JSON file to be created in the Amazon S3 bucket. While processing a job, the Amazon S3 Storage Egress component will collate results until the file size is reached. Once the size is reached the file will be uploaded to the Amazon S3 bucket. Where the job generates more results additional files will be created with an incrementing number appended to the file name i.e. "-1", "-2".

The Amazon S3 Storage Egress component will wait for 60 seconds for new documents to collate, if no more are received in that time, the collated file is uploaded to the Amazon S3 bucket even if size limit is not reached.

Egress Data

It is recommended to use Documents (default configuration). For Ingress and Operation Components (i.e. WebSightLine File Fetcher) that process file objects (i.e. images, PDFs) these objects can be retained in cache for additional processing and egressed at the end of the pipeline using the alternative options: Files & Documents or Files only.

Output Format

Options for JSON collation format.

S3 Access Key & S3 Secret Key(Required)

Add your Amazon S3 Access and Secret Keys in the "Keys & Secrets" page from the Portal menu

Region (required)

Text field for the Amazon Region for the target Amazon S3 bucket.