Amazon S3
This is a direct integration. An AWS account and S3 credentials are required
Amazon S3 Ingress
The Amazon S3 Ingress component reads files from an S3 bucket and feeds them into your pipeline.
Supported file formats include JSON, JSON Lines, XML, CSV, PDF, and others.
Configuration
Ingestion Type
Controls how the component monitors the bucket. Options include reading all available files, monitoring for new files since the last run, or reading from a specific path.
Bucket and Path
Specify the S3 bucket name and optional folder path to read from.
File Format
Select the format of files in the bucket. This determines how documents are parsed.
Connector Automation (Jobs)
The Amazon S3 Ingress component works with the Job system to handle scheduling, retries, and change detection. See Connector Automation for details.
Credentials
Add your AWS Access Key and Secret Key in the Keys and Secrets page in Portal.
Region
Specify the AWS region of the S3 bucket.
Setup Guide
For a step-by-step walkthrough, see the Amazon S3 Ingress Setup Guide.
Amazon S3 Destination
The Amazon S3 Destination component writes processed documents from a pipeline to an S3 bucket.
It is recommended to apply a Unify or JSON Transform component before this component to standardize the output format.
Configuration
Collation Type
Controls how documents are grouped into output files. File Collation (default) groups documents into files per Job run. Alternatives are message-based collation or one file per document.
Bucket
The name of the target S3 bucket.
Metadata Tag (optional)
Specify a tag name to use as the output folder name within the bucket. The tag value is set at Job creation. If the tag is not present on a document, the tag value is used as the default folder name.
Collation Size
The target file size in bytes before the file is uploaded. Once this size is reached, the file is written to S3 and a new file begins. If no new documents arrive within 60 seconds of the last document, the current file is uploaded regardless of size.
Egress Data
Select what to write: documents only (default), files and documents, or files only. The files options apply when upstream components (such as WebSightLine File Fetcher) produce file objects such as images or PDFs.
Output Format
The JSON collation format for the output file.
Credentials
Add your AWS Access Key and Secret Key in the Keys and Secrets page in Portal.
Region
The AWS region of the target S3 bucket.
