Load Data from Amazon S3 Using IMPORT

You can use the IMPORT command to load data from Amazon S3 buckets on AWS. Exasol automatically recognizes Amazon S3 import based on the URL.

Only Amazon S3 on AWS is supported. Other services that use an S3 compatible protocol are not supported.

For authentication, you can connect to public or private objects using an AWS access key.

In versions prior to 7.0.21, leaving the USER and IDENTIFIED BY fields blank triggered the database to connect with temporary IAM credentials. For security reasons, this was removed. For more details, see Changelog Entry 15155.

To authenticate with an AWS access key, insert the Amazon S3 access key ID in the USER field and the secret access key in the IDENTIFIED BY field. For example:

Examples

Exasol recommends creating a connection object to store connection strings and credentials. For more information, see CREATE CONNECTION.

An Amazon S3 bucket URL using the legacy global endpoint format (<bucket-name>.s3.amazonaws.com) may need up to 24 hours after bucket creation before it becomes available. A fully qualified S3 URL that includes the AWS region (<bucket-name>.s3.<region-code>.amazonaws.com) will work immediately.

CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://testbucket.s3-<region>.amazonaws.com'
USER '<key-ID>' 
IDENTIFIED BY '<secret-key>'
CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://testbucket.s3.amazonaws.com'
USER '<key-ID>' 
IDENTIFIED BY '<secret-key>'

To import or export your data using a connection string, run either of the following statements:

IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

Multipart Uploads

Upload to Amazon S3 on AWS is done in parts. AWS allows maximum 10000 parts for each upload. Therefore, you may need a large part size if you want to upload a large file. The part size for the upload is configurable with the -awsS3PartSize parameter. The default size of this parameter is 10 MiB represented in bytes (-awsS3PartSize=1048576). To see how to change this parameter, see Edit a Database.

Here are some examples of what could be the maximum size of the file that you can upload:

  • If the part size is 10 MB, the maximum size of the file you can upload: 10 MB parts: (10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 97.7 GB
  • If the part size is 5 MB, the maximum size of the file you can upload: 5 MB parts: (5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 48.8 GB

The value of the -awsS3PartSize parameter is consistent across all your queries. You cannot configure it separately for different queries.

If you need to load Amazon S3 files in parallel, see Load Data from Amazon S3 in Parallel Using UDFs.

To learn more about the part size and limitation on it, see Amazon S3 Multipart Upload Limits.