Load Data from Amazon S3 Using IMPORT

This article explains how to load data from Amazon S3 into Exasol using the IMPORT command.

You can use the IMPORT command to load data from Amazon S3 buckets on AWS. Exasol automatically recognizes Amazon S3 import based on the URL used with the IMPORT statement.

Exasol supports loading the following types of CSV files from Amazon S3:

Public objects
Private objects using AWS access keys
Private objects using presigned URLs

This operation is only supported for files stored on Amazon S3 on AWS. Other cloud services that use an S3 compatible protocol are not supported.

Access to public objects without an access key and to private objects using presigned URLs was introduced in version 7.1.16. In versions prior to 7.1.16, leaving the USER and IDENTIFIED BY fields blank triggered the database to connect with temporary IAM credentials.

For more information, see changelog entries 15504 and 15155.

For more information about part size and limitations, see Amazon S3 Multipart Upload Limits.

For information about how to load Amazon S3 files in parallel, see Load Data from Amazon S3 in Parallel Using UDF.

Create connection

We recommend that you use CREATE CONNECTION to create a connection object to store connection strings and credentials.

Copy

CREATE OR REPLACE CONNECTION my_s3_bucket
    TO 'https://<my_bucketname>.s3.<my_region>.amazonaws.com'
    USER '<my_access_key>'
    IDENTIFIED BY '<my_secret_key>';

Placeholder	Description
`<my_bucketname>`	The name of the Amazon S3 bucket
`<my_region>`	The AWS region, for example: `eu-west-1`
`<my_access_key>`	AWS access key ID
`<my_secret_key>`	AWS secret key

If the AWS bucket does not require authentication, you can omit the clauses USER and IDENTIFIED BY.

Import data

Public objects

To load public files, leave the USER and IDENTIFIED BY fields in the connection object empty.

Example:

Copy

CREATE CONNECTION my_s3_bucket
TO 'https://public-testbucket.s3-eu-west-1.amazonaws.com';

Copy

IMPORT INTO testtable FROM CSV AT my_s3_bucket
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT my_s3_bucket
FILE 'testpath/test.csv';

Private objects using access keys

To authenticate using AWS access keys, insert the access key ID in the USER field and the secret key in the IDENTIFIED BY field in the connection object.

Example:

Copy

CREATE CONNECTION my_s3_bucket
    TO 'https://my_private_bucket.s3.eu-west-1.amazonaws.com'
    USER 'AKIAIOSFODNN7EXAMPLE'
    IDENTIFIED BY 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY';

Copy

IMPORT INTO testtable FROM CSV AT my_s3_bucket
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT my_s3_bucket
FILE 'testpath/test.csv';

Private objects using presigned URLs

To load private files using a presigned URL, leave the USER and IDENTIFIED BY fields empty in the connection object. Add the query string (everything after the file name and question mark in the presigned URL) to the file name specified in the FILE field in the IMPORT statement.

Example:

Copy

CREATE CONNECTION my_s3_bucket
TO 'https://private-testbucket.s3-eu-west-1.amazonaws.com';

Copy

IMPORT INTO testtable FROM CSV AT my_s3_bucket
FILE 'testpath/test.csv?<query_string>';

Multipart upload

Upload to Amazon S3 on AWS is done in parts. AWS allows maximum 10,000 parts for each upload. Therefore, you may need a large part size if you want to upload a large file. The part size for the upload is configurable with the -awsS3PartSize parameter. The default size of this parameter is 10 MiB represented in bytes (-awsS3PartSize=1048576).

For information about how to edit database parameters, see Database Management in the Administration section for your selected deployment platform in this documentation.

Invalid parameters will prevent the database from starting. To avoid unnecessary downtime, create a support case to get guidance from Support before you add or change database parameters.

Examples

If the part size is 10 MiB, the maximum size of the file you can upload is:

(10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GiB) = 97.7 GiB

If the part size is 5 MiB, the maximum size of the file you can upload is:

(5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GiB) = 48.8 GiB

The value of -awsS3PartSize will be applied to all queries. You cannot configure it separately for different queries.

Load Data from Amazon S3 Using IMPORT

Create connection

Import data

Public objects

Example:

Private objects using access keys

Example:

Private objects using presigned URLs

Example:

Multipart upload

Examples

PRODUCT

RESOURCES