Load Data from Amazon S3 Using IMPORT

You can use the IMPORT command to load data from Amazon S3 buckets on AWS. Exasol automatically recognizes Amazon S3 import based on the URL.

Only Amazon S3 on AWS is supported. Other services that use an S3 compatible protocol are not supported.

Exasol supports loading the following types of CSV files from S3:

  • Public objects

  • Private objects using access keys

  • Private objects using presigned URLs

Access to public objects without an access key and private objects using presigned URLs was introduced in version 7.1.16. In versions prior to 7.1.16, leaving the USER and IDENTIFIED BY fields blank triggered the database to connect with temporary IAM credentials. For more information, see changelog entries 15504 and 15155.

Examples

Exasol recommends creating a connection object to store connection strings and credentials. For more information, see CREATE CONNECTION.

An Amazon S3 bucket URL using the legacy global endpoint format (<bucket-name>.s3.amazonaws.com) may need up to 24 hours after bucket creation before it becomes available.

A fully qualified Amazon S3 bucket URL that includes the AWS region (<bucket-name>.s3.<region-code>.amazonaws.com) will become available immediately.

Public Objects

To load public files, leave the USER and IDENTIFIED BY fields empty. For example:

CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://public-testbucket.s3-<region>.amazonaws.com'
IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

Private Objects using Access Keys

To authenticate with an AWS access key, insert the Amazon S3 access key ID in the USER field and the secret access key in the IDENTIFIED BY field. For example:

CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://testbucket.s3-<region>.amazonaws.com'
USER '<key-ID>' 
IDENTIFIED BY '<secret-key>'
IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

Private Objects using presigned URLs

To load private files using a presigned URL, leave the USER and IDENTIFIED BY fields empty. Add the query string (everything after the file name and question mark in the presigned URL) to the file name specified in the FILE field in the IMPORT statement. For example:

CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://private-testbucket.s3-<region>.amazonaws.com'
IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv?<query_string>';

Multipart Uploads

Upload to Amazon S3 on AWS is done in parts. AWS allows maximum 10000 parts for each upload. Therefore, you may need a large part size if you want to upload a large file. The part size for the upload is configurable with the -awsS3PartSize parameter. The default size of this parameter is 10 MiB represented in bytes (-awsS3PartSize=1048576). To see how to change this parameter, see Edit a Database.

Here are some examples of what could be the maximum size of the file that you can upload:

  • If the part size is 10 MB, the maximum size of the file you can upload: 10 MB parts: (10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 97.7 GB
  • If the part size is 5 MB, the maximum size of the file you can upload: 5 MB parts: (5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 48.8 GB

The value of the -awsS3PartSize parameter is consistent across all your queries. You cannot configure it separately for different queries.

If you need to load Amazon S3 files in parallel, see Load Data from Amazon S3 in Parallel Using UDFs.

To learn more about the part size and limitation on it, see Amazon S3 Multipart Upload Limits.