Load Data from Amazon S3 Using IMPORT

You can use the IMPORT command to load data from Amazon S3 buckets. Exasol automatically recognizes AWS S3 import based on the URL. For authentication, you can use one of the following mechanisms:

  • A connection string
  • Temporary credentials for EC2 IAM roles

Exasol recommends creating a connection string. To authenticate with a user/password, use the AWS S3 access key ID in the USER field and the secret access key in the IDENTIFIED BY field.

CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://testbucket.s3.amazonaws.com'
USER '<key-ID>' 
IDENTIFIED BY '<secret-key>'

To import or export your data using a connection string, run either of the following statements:

IMPORT INTO testtable FROM CSV AT 'S3_MY_BUCKETNAME'
FILE 'testpath/test.csv';

EXPORT testtable FROM CSV AT 'S3_MY_BUCKETNAME'
FILE 'testpath/test.csv';

To use temporary credentials for EC2 IAM roles, leave the USER and IDENTIFIED BY fields empty in either of the following statements:

IMPORT INTO testtable INTO CSV AT 'https://testbucket.s3.amazonaws.com'
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT 'https://testbucket.s3.amazonaws.com'
FILE 'testpath/test.csv';

Upload to Amazon S3 is done in parts. AWS allows maximum 10000 parts for each upload. Therefore, you may need large part size if you want to upload a large file. The part size for the upload is configured through awsS3PartSize parameter. The default size of this parameter is 10 MB.

Here are some examples of what could be the maximum size of the file that you can upload:

  • If the part size is 10 MB, the maximum size of the file you can upload: 10 MB parts: (10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 97.7 GB
  • If the part size is 5 MB, the maximum size of the file you can upload: 5 MB parts: (5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 48.8 GB

The value of the awsS3PartSize parameter is consistent across all your queries. You cannot configure it separately for different queries.

If you need load Amazon S3 files in parallel, see Load Data from Amazon S3 in Parallel Using UDFs.

To know more about the part size and limitation on it, see Amazon S3 Multipart Upload Limits.