Load Data from Amazon S3 Using IMPORT

You can use the IMPORT command to load data from Amazon S3 buckets. Exasol automatically recognizes AWS S3 import based on the URL. For authentication, you can use one of the following mechanisms:

  • A connection string
  • Temporary credentials for EC2 IAM roles

Exasol recommends creating a connection string. To authenticate with a user/password, use the AWS S3 access key ID in the USER field and the secret access key in the IDENTIFIED BY field.

CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://testbucket.s3.amazonaws.com'
USER '<key-ID>' 
IDENTIFIED BY '<secret-key>'

To import or export your data using a connection string, run either of the following statements:

IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

EXPORT testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

To use temporary credentials for EC2 IAM roles, leave the USER and IDENTIFIED BY fields empty in either of the following statements:

IMPORT INTO testtable INTO CSV AT 'https://testbucket.s3.amazonaws.com'
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT 'https://testbucket.s3.amazonaws.com'
FILE 'testpath/test.csv';

Upload to Amazon S3 is done in parts. AWS allows maximum 10000 parts for each upload. Therefore, you may need a large part size if you want to upload a large file. The part size for the upload is configurable with the -awsS3PartSize parameter. The default size of this parameter is 10 MiB represented in bytes (-awsS3PartSize=1048576). To see how to change this parameter, see Edit a Database.

Here are some examples of what could be the maximum size of the file that you can upload:

  • If the part size is 10 MB, the maximum size of the file you can upload: 10 MB parts: (10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 97.7 GB
  • If the part size is 5 MB, the maximum size of the file you can upload: 5 MB parts: (5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GB) = 48.8 GB

The value of the -awsS3PartSize parameter is consistent across all your queries. You cannot configure it separately for different queries.

If you need to load Amazon S3 files in parallel, see Load Data from Amazon S3 in Parallel Using UDFs.

To know more about the part size and limitation on it, see Amazon S3 Multipart Upload Limits.