Load Data from Amazon S3 Using IMPORT
This article explains how to load data from Amazon S3 into Exasol using the IMPORT command.
You can use the IMPORT command to load data from Amazon S3 buckets on AWS. Exasol automatically recognizes Amazon S3 import based on the URL used with the IMPORT statement.
This operation is only supported for files stored on Amazon S3 on AWS. Other cloud services that use an S3 compatible protocol are not supported.
Exasol supports loading the following types of CSV files from Amazon S3:
-
Public objects
-
Private objects using AWS access keys
-
Private objects using presigned URLs
Access to public objects without an access key and private objects using presigned URLs was introduced in version 7.1.16. In versions prior to 7.1.16, leaving the USER and IDENTIFIED BY fields blank triggered the database to connect with temporary IAM credentials. For more information, see changelog entries 15504 and 15155.
Create connection
We recommend that you use CREATE CONNECTION to create a connection object to store connection strings and credentials.
CREATE CONNECTION S3_MY_BUCKETNAME
TO 'http://<my_bucketname>.s3.<my_region>.amazonaws.com'
USER '<my_access_key>'
IDENTIFIED BY '<my_secret_key>';
Placeholder | Description |
---|---|
<my_bucketname>
|
The name of the Amazon S3 bucket |
<my_region>
|
The AWS region, for example: eu-west-1 |
<my_access_key>
|
AWS access key ID |
<my_secret_key>
|
AWS secret key |
If the AWS bucket does not require authentication, you can omit the clauses USER
and IDENTIFIED BY
.
Import data
Public objects
To load public files, leave the USER
and IDENTIFIED BY
fields in the connection object empty.
Example:
IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';
EXPORT testtable INTO CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';
Private objects using access keys
To authenticate using AWS access keys, insert the access key ID in the USER
field and the secret key in the IDENTIFIED BY
field in the connection object.
Example:
CREATE CONNECTION S3_MY_BUCKETNAME
TO 'http://my_private_bucket.s3.eu-west-1.amazonaws.com'
USER 'AKIAIOSFODNN7EXAMPLE'
IDENTIFIED BY 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY';
IMPORT INTO testtable FROM CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';
EXPORT testtable INTO CSV AT S3_MY_BUCKETNAME
FILE 'testpath/test.csv';
Private objects using presigned URLs
To load private files using a presigned URL, leave the USER
and IDENTIFIED BY
fields empty in the connection object. Add the query string (everything after the file name and question mark in the presigned URL) to the file name specified in the FILE
field in the IMPORT statement.
Example:
Multipart upload
Upload to Amazon S3 on AWS is done in parts. AWS allows maximum 10,000 parts for each upload. Therefore, you may need a large part size if you want to upload a large file. The part size for the upload is configurable with the -awsS3PartSize
parameter. The default size of this parameter is 10 MiB represented in bytes (-awsS3PartSize=1048576
).
For information about how to edit database parameters, see Database Management in the Administration section for your selected deployment platform in this documentation.
Invalid parameters or invalid configuration syntax will prevent the database from starting. To avoid unnecessary downtime, contact Support before you add or change database parameters.
Examples
If the part size is 10 MiB, the maximum size of the file you can upload is:
(10*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GiB) = 97.7 GiB
If the part size is 5 MiB, the maximum size of the file you can upload is:
(5*1024*1024 bytes/part) * (10000 parts) / (1024*1024*1024 bytes/GiB) = 48.8 GiB
The value of -awsS3PartSize
will be applied to all queries. You cannot configure it separately for different queries.
For more information about part size and limitations, see Amazon S3 Multipart Upload Limits.
For information about how to load Amazon S3 files in parallel, see Load Data from Amazon S3 in Parallel Using UDF.