Load Data from Amazon S3 Using IMPORT

You can use the IMPORT command to load data from Amazon S3 buckets on AWS. Exasol automatically recognizes Amazon S3 import based on the URL.

Only Amazon S3 on AWS is supported. Other services that use an S3 compatible protocol are not supported.

For authentication, you can use one of the following mechanisms:

  • A connection string
  • Temporary credentials for EC2 IAM roles

Exasol recommends creating a connection string. To authenticate with a user/password, use the Amazon S3 access key ID in the USER field and the secret access key in the IDENTIFIED BY field.

CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://testbucket.s3.amazonaws.com'
USER '<key-ID>' 
IDENTIFIED BY '<secret-key>'

To import or export your data using a connection string, run either of the following statements:

IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

To use temporary credentials for EC2 IAM roles, leave the USER and IDENTIFIED BY fields empty in either of the following statements:

IMPORT INTO testtable FROM CSV AT 'https://testbucket.s3.amazonaws.com'
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT 'https://testbucket.s3.amazonaws.com'
FILE 'testpath/test.csv';

An AWS bucket URL using the legacy global endpoint format (<bucketname>.s3.amazonaws.com) may need up to 24 hours after bucket creation before it becomes available. A fully qualified Amazon S3 URL that includes the AWS region (<bucketname>.s3-<region>.amazonaws.com) will work immediately.

Upload to Amazon S3 on AWS is done in parts. AWS allows maximum 10000 parts for each upload. Therefore, you may need a large part size if you want to upload a large file. The part size for the upload is configurable with the -awsS3PartSize parameter. The default size of this parameter is 10 MiB represented in bytes (-awsS3PartSize=1048576). To see how to change this parameter, see Edit a Database.

Here are some examples of what could be the maximum size of the file that you can upload:

  • If the part size is 10 MB, the maximum size of the file you can upload: 10 MB parts: (10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 97.7 GB
  • If the part size is 5 MB, the maximum size of the file you can upload: 5 MB parts: (5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 48.8 GB

The value of the -awsS3PartSize parameter is consistent across all your queries. You cannot configure it separately for different queries.

If you need to load Amazon S3 files in parallel, see Load Data from Amazon S3 in Parallel Using UDFs.

To know more about the part size and limitation on it, see Amazon S3 Multipart Upload Limits.