Load Data from Other Files
In addition to CSV/FBV file types, you can load data into Exasol from cloud storage systems. Exasol’s Cloud Storage Extension enables you to easily transfer formatted data between Exasol and cloud storage systems such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.
The following table lists the cloud storage systems and file types that are supported by the Cloud Storage Extension.
Storage system | Function | File format | |||
---|---|---|---|---|---|
Parquet | Delta Lake | ORC | Avro | ||
Amazon S3 | Import | ||||
Export | |||||
Azure Blob Storage | Import | ||||
Export | |||||
Azure Data Lake (Gen 1) | Import | ||||
Export | |||||
Azure Data Lake (Gen 2) | Import | ||||
Export | |||||
Google Cloud Storage | Import | ||||
Export |
Supported |
|
Not supported |
Cloud storage systems not listed in the table are not supported.
Setting up the UDFs
To use the UDFs, do the following:
- Download the latest JAR files from Download.
- Upload the JAR to a bucket in the BucketFS. For more information about BucketFS, see BucketFS.
- Set up the ETL scripts using the procedures described in Deployment.
Usage examples
In the following example, you can see an excerpt of Exasol UDFs and connection objects for importing and exporting Parquet formatted data with different types of cloud storage. For more information about the examples, see User Guide.
The user that runs the IMPORT or EXPORT statement needs the ACCESS privilege on the connection specified directly or via a role. For more details, see Privileges and Details on Rights Management.
We recommend using connection objects when providing credentials to UDFs to prevent secrets being visible in the audit logs.
Amazon S3
To create a connection object using AWS access and secret keys, use the following statement:
CREATE OR REPLACE CONNECTION S3_CONNECTION
TO ''
USER ''
IDENTIFIED BY 'S3_ACCESS_KEY=<AWS_ACCESS_KEY>;S3_SECRET_KEY=<AWS_SECRET_KEY>';
To learn more about how to use connection objects with Amazon S3, see Create Exasol Connection Object.
The examples for IMPORT
and EXPORT
using connection objects are given below:
Import
IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
BUCKET_PATH = 's3a://<BUCKET>/import/orc/sales_positions/*'
DATA_FORMAT = 'ORC'
S3_ENDPOINT = 's3.<REGION>.amazonaws.com'
CONNECTION_NAME = 'S3_CONNECTION';
Export
EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
BUCKET_PATH = 's3a://<BUCKET>/export/parquet/sales_positions/'
DATA_FORMAT = 'PARQUET'
S3_ENDPOINT = 's3.<REGION>.amazonaws.com'
CONNECTION_NAME = 'S3_CONNECTION';
For additional information, see Amazon S3.
Google Cloud Storage
Import
IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
BUCKET_PATH = 'gs://<GCS-STORAGE>/import/avro/sales_positions/*'
DATA_FORMAT = 'AVRO'
GCS_PROJECT_ID = '<GCP_PORJECT_ID>'
GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET_NAME>/gcp-<PROJECT_ID>-service-keyfile.json';
Export
EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
BUCKET_PATH = 'gs://<GCS-STORAGE>/export/parquet/sales_positions/'
DATA_FORMAT = 'PARQUET'
GCS_PROJECT_ID = '<GCP_PORJECT_ID>'
GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET_NAME>/gcp-<PROJECT_ID>-service-keyfile.json';
For additional information, see Google Cloud Storage.
Azure Blob Storage
Import using secret key connection object
IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
BUCKET_PATH = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/import/orc/*'
DATA_FORMAT = 'ORC'
CONNECTION_NAME = 'AZURE_BLOB_SECRET_CONNECTION';
Import using SAS token connection object
IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
BUCKET_PATH = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/import/orc/*'
DATA_FORMAT = 'ORC'
CONNECTION_NAME = 'AZURE_BLOB_SAS_CONNECTION';
Export using secret key connection object
EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
BUCKET_PATH = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/export/parquet/'
DATA_FORMAT = 'PARQUET'
CONNECTION_NAME = 'AZURE_BLOB_SECRET_CONNECTION';
Export using SAS token connection object
EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
BUCKET_PATH = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/export/parquet/'
DATA_FORMAT = 'PARQUET'
CONNECTION_NAME = 'AZURE_BLOB_SAS_CONNECTION';
For additional information, see Azure Blob Storage.
Azure Data Lake Storage (Generation 1)
For more information about how to use connection objects with Azure Data Lake Storage (Gen 1), see Azure Data Lake Gen1 Storage.
Import
IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
BUCKET_PATH = 'adl://<AZURE_CONTAINER_NAME>.azuredatalakestore.net/import/avro/*'
DATA_FORMAT = 'AVRO'
CONNECTION_NAME = 'AZURE_ADLS_CONNECTION';
Export
EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
BUCKET_PATH = 'adl://<AZURE_CONTAINER_NAME>.azuredatalakestore.net/export/parquet/'
DATA_FORMAT = 'PARQUET'
CONNECTION_NAME = 'AZURE_ADLS_CONNECTION';
Azure Data Lake Storage (Generation 2)
For more information about how to use connection objects with Azure Data Lake Storage (Gen 2), see Azure Data Lake Gen 2 Storage.
Import
IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
BUCKET_PATH = 'abfs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.dfs.core.windows.net/import/orc/*'
DATA_FORMAT = 'ORC'
CONNECTION_NAME = 'AZURE_ABFS_CONNECTION';
Export
EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
BUCKET_PATH = 'abfss://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.dfs.core.windows.net/export/parquet/'
DATA_FORMAT = 'PARQUET'
CONNECTION_NAME = 'AZURE_ABFS_CONNECTION';
Delta Lake
Import
IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
BUCKET_PATH = 's3a://<BUCKET>/import/delta/sales_positions/*'
DATA_FORMAT = 'DELTA'
S3_ENDPOINT = 's3.<REGION>.amazonaws.com'
CONNECTION_NAME = 'S3_CONNECTION';
For detailed instructions on importing data from Delta Lake Format, see Delta Format on the GitHub repository.
Hadoop Distributed Filesystem (HDFS)
The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file system written in Java for the Hadoop framework. When the Hadoop datanodes and Exasol cluster are installed in the same (virtual) network, you can access the HDFS using CLOUD_STORAGE_EXTENSION
.
Since the Hadoop nodes and the Exasol cluster must be in the same private network, a connection object is not necessary.
Import
IMPORT INTO <schema>.<table>
FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
BUCKET_PATH = 'hdfs://<HDFS_PATH>/import/orc/data/*.orc'
DATA_FORMAT = 'ORC';
Export
EXPORT <schema>.<table>
INTO SCRIPT CLOUD_STORAGE_EXTENSION.EXPORT_PATH WITH
BUCKET_PATH = 'hdfs://<HDFS_PATH>/export/parquet/data/'
DATA_FORMAT = 'PARQUET';
Contribute to the project
If you want to contribute to the Cloud Storage Extension open source project, see Information for Contributors.