Load data from Amazon S3 using IMPORT

This article explains how to load data from Amazon S3 buckets into Exasol using the IMPORT command.

You can use the IMPORT command to load data from CSV and Parquet files in Amazon S3 buckets on AWS. Exasol automatically recognizes an Amazon S3 import based on the URL that is used with the IMPORT statement.

Exasol supports loading the following object types from Amazon S3 buckets:

Public objects
Private objects using AWS access keys
Private objects using presigned URLs

This operation is only supported for files stored on Amazon S3 on AWS. Other cloud services that use an S3 compatible protocol are not supported. Exasol automatically recognizes an Amazon S3 import based on the URL that is used with the IMPORT statement.

To learn how to load Parquet files from an Amazon S3 bucket on AWS, see Load data from Apache Parquet files in Amazon S3 on AWS.

Create a connection

We recommend that you use CREATE CONNECTION to create a connection object to store connection strings and credentials.

Copy

CREATE OR REPLACE CONNECTION my_s3_bucket
    TO 'https://<my_bucketname>.s3.<my_region>.amazonaws.com'
    USER '<my_access_key>'
    IDENTIFIED BY '<my_secret_key>';

Placeholder	Description
`<my_bucketname>`	The name of the Amazon S3 bucket
`<my_region>`	The AWS region, for example: `eu-west-1`
`<my_access_key>`	AWS access key ID
`<my_secret_key>`	AWS secret key

If the AWS bucket does not require authentication, you can omit the clauses USER and IDENTIFIED BY.

Import data

Public objects

To load public files, leave the USER and IDENTIFIED BY fields in the connection object empty.

Example:

Copy

CREATE CONNECTION my_s3_bucket
TO 'https://public-testbucket.s3.eu-west-1.amazonaws.com';

Copy

IMPORT INTO testtable FROM CSV AT my_s3_bucket
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT my_s3_bucket
FILE 'testpath/test.csv';

Private objects using access keys

To authenticate using AWS access keys, insert the access key ID in the USER field and the secret key in the IDENTIFIED BY field in the connection object.

Example:

Copy

CREATE CONNECTION my_s3_bucket
    TO 'https://my_private_bucket.s3.eu-west-1.amazonaws.com'
    USER 'AKIAIOSFODNN7EXAMPLE'
    IDENTIFIED BY 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY';

Copy

IMPORT INTO testtable FROM CSV AT my_s3_bucket
FILE 'testpath/test.csv';

EXPORT testtable INTO CSV AT my_s3_bucket
FILE 'testpath/test.csv';

Private objects using presigned URLs

To load private files using a presigned URL, leave the USER and IDENTIFIED BY fields empty in the connection object. Add the query string (everything after the file name and question mark in the presigned URL) to the file name specified in the FILE field in the IMPORT statement.

Example:

Copy

CREATE CONNECTION my_s3_bucket
TO 'https://private-testbucket.s3.eu-west-1.amazonaws.com';

Copy

IMPORT INTO testtable FROM CSV AT my_s3_bucket
FILE 'testpath/test.csv?<query_string>';

Multipart upload

A file upload to an S3 bucket on AWS is done in parts. AWS allows a maximum of 10,000 parts for each upload. This means that if you want to upload a large file, you may need to configure a larger part size to keep the total number of parts below 10,000. The part size is configurable with the parameter -awsS3PartSize. The default size is 10 MiB represented in bytes (-awsS3PartSize=1048576).

To learn how to edit database parameters, see Database Management in the Administration section for your selected deployment platform.

Invalid parameters will prevent the database from starting. To avoid unnecessary downtime, create a support case to get guidance from Support before you add or change database parameters.

Examples

If the part size is 10 MiB, the maximum size of the file you can upload is:

(10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GiB) = 97.7 GiB

If the part size is 5 MiB, the maximum size of the file you can upload is:

(5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GiB) = 48.8 GiB

The value of -awsS3PartSize will be applied to all queries. You cannot configure it separately for different queries.

For more information about part size and limitations, see Amazon S3 Multipart Upload Limits.

To learn how to load Amazon S3 files in parallel, see Load data from Amazon S3 in parallel using UDF.

Load data from Amazon S3 using IMPORT

Create a connection

Import data

Public objects

Example:

Private objects using access keys

Example:

Private objects using presigned URLs

Example:

Multipart upload

Examples

PRODUCT

RESOURCES