Load Data from Amazon S3 Using IMPORT

This article explains how to load data from Amazon S3 into Exasol using the IMPORT command.

You can use the IMPORT command to load data from Amazon S3 buckets on AWS. Exasol automatically recognizes Amazon S3 import based on the URL used with the IMPORT statement.

This operation is only supported for files stored on Amazon S3 on AWS. Other cloud services that use an S3 compatible protocol are not supported.

Exasol supports loading the following types of CSV files from Amazon S3:

  • Public objects

  • Private objects using AWS access keys

  • Private objects using presigned URLs

Access to public objects without an access key and private objects using presigned URLs was introduced in version 7.1.16. In versions prior to 7.1.16, leaving the USER and IDENTIFIED BY fields blank triggered the database to connect with temporary IAM credentials. For more information, see changelog entries 15504 and 15155.

Create connection

We recommend that you use CREATE CONNECTION to create a connection object to store connection strings and credentials.

CREATE CONNECTION S3_MY_BUCKETNAME
    TO 'http://<my_bucketname>.s3.<my_region>.amazonaws.com'
    USER '<my_access_key>'
    IDENTIFIED BY '<my_secret_key>';
Placeholder Description
<my_bucketname> The name of the Amazon S3 bucket
<my_region> The AWS region, for example: eu-west-1
<my_access_key> AWS access key ID
<my_secret_key> AWS secret key

If the AWS bucket does not require authentication, you can omit the clauses USER and IDENTIFIED BY.

Import data

Public objects

To load public files, leave the USER and IDENTIFIED BY fields in the connection object empty.

Example:
CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://public-testbucket.s3-eu-west-1.amazonaws.com'
IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

Private objects using access keys

To authenticate using AWS access keys, insert the access key ID in the USER field and the secret key in the IDENTIFIED BY field in the connection object.

Example:
CREATE CONNECTION S3_MY_BUCKETNAME
    TO 'http://my_private_bucket.s3.eu-west-1.amazonaws.com'
    USER 'AKIAIOSFODNN7EXAMPLE'
    IDENTIFIED BY 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY';
IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';

Private objects using presigned URLs

To load private files using a presigned URL, leave the USER and IDENTIFIED BY fields empty in the connection object. Add the query string (everything after the file name and question mark in the presigned URL) to the file name specified in the FILE field in the IMPORT statement.

Example:
CREATE CONNECTION S3_MY_BUCKETNAME
TO 'https://private-testbucket.s3-eu-west-1.amazonaws.com'
IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv?<query_string>';

Multipart upload

Upload to Amazon S3 on AWS is done in parts. AWS allows maximum 10,000 parts for each upload. Therefore, you may need a large part size if you want to upload a large file. The part size for the upload is configurable with the -awsS3PartSize parameter. The default size of this parameter is 10 MiB represented in bytes (-awsS3PartSize=1048576).

For information about how to edit database parameters, see Database Management in the Administration section for your selected deployment platform in this documentation.

Invalid parameters or invalid configuration syntax will prevent the database from starting. To avoid unnecessary downtime, contact Support before you add or change database parameters.

Examples

If the part size is 10 MiB, the maximum size of the file you can upload is:

(10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GiB) = 97.7 GiB

If the part size is 5 MiB, the maximum size of the file you can upload is:

(5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GiB) = 48.8 GiB

The value of -awsS3PartSize will be applied to all queries. You cannot configure it separately for different queries.

For more information about part size and limitations, see Amazon S3 Multipart Upload Limits.

For information about how to load Amazon S3 files in parallel, see Load Data from Amazon S3 in Parallel Using UDF.