Loading from Other Files

Apart from CSV/FBV file types, you can also load data into Exasol from cloud storage systems. Exasol's Cloud Storage Extension allow you to import and export of formatted data between Exasol and cloud storage systems such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. You can use connection objects, or you can use the UDF parameters to import or export your data.

Exasol recommends using connection objects to provide credentials to the UDFs and hide the secrets information from the audit logs.

The following table lists the support file types along with the supported import/export functions based on the cloud storage system: 

Storage System Function File Format
Parquet Delta Lake ORC Avro
Amazon S3 Import
Export
Azure Blob Storage Import
Export
Azure Data Lake (Gen 1) Import
Export
Azure Data Lake (Gen 2) Import
Export
Google Cloud Storage Import
Export
- Indicates supported function

- Indicates unsupported function

Any other cloud storage system not listed in the above table is not supported.

Setting up the UDFs

Do the following to use the UDFs:

  1. Download the latest jar files.
  2. Upload the jar to a bucket in the BucketFS. To know about BucketFS, see BucketFS.
  3. Set up the ETL scripts using the procedures mentioned in Deployment.

Usage Examples

In the following example, you can see an excerpt of Exasol UDFs and connection objects for importing and exporting Parquet formatted data with different types of cloud storage. For more information about the examples, see User Guide.

Amazon S3

You can use the following statements to create connection objects using AWS access and secret keys.

CREATE OR REPLACE CONNECTION S3_CONNECTION
TO ''
USER ''
IDENTIFIED BY 'S3_ACCESS_KEY=<AWS_ACCESS_KEY>;S3_SECRET_KEY=<AWS_SECRET_KEY>';

To know more about how to use connection objects with Amazon S3, see Create Exasol Connection Object.

The examples for import and export using connection objects are given below:

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/import/orc/sales_positions/*'
  DATA_FORMAT     = 'ORC'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION'
  PARALLELISM     = 'nproc()*2';

Export

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/export/parquet/sales_positions/'
  DATA_FORMAT     = 'PARQUET'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION'
  PARALLELISM     = 'iproc(), floor(random()*2)';

For additional information, refer to Amazon S3 on the GitHub repository.

Google Cloud Storage

The examples for import and export using UDF parameters objects are given below:

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH      = 'gs://<GCS-STORAGE>/import/avro/sales_positions/*'
  DATA_FORMAT      = 'AVRO'
  GCS_PROJECT_ID   = '<GCP_PORJECT_ID>'
  GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET_NAME>/gcp-<PROJECT_ID>-service-keyfile.json'
  PARALLELISM      = 'nproc()*4';

Export

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH      = 'gs://<GCS-STORAGE>/export/parquet/sales_positions/'
  DATA_FORMAT      = 'PARQUET'
  GCS_PROJECT_ID   = '<GCP_PORJECT_ID>'
  GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET_NAME>/gcp-<PROJECT_ID>-service-keyfile.json'
  PARALLELISM      = 'iproc(), floor(random()*4)';

For additional information, refer to Google Cloud Storage on the GitHub repository.

Azure Blob Storage

The examples for import and export using connection objects are given below:

Import using secret key connection object

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH      = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/import/orc/*'
  DATA_FORMAT      = 'ORC'
  CONNECTION_NAME  = 'AZURE_BLOB_SECRET_CONNECTION'
  PARALLELISM      = 'nproc()';

Import using SAS token connection object

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/import/orc/*'
  DATA_FORMAT     = 'ORC'
  CONNECTION_NAME = 'AZURE_BLOB_SAS_CONNECTION'
  PARALLELISM     = 'nproc()';

Export using secret key connection object

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH      = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/export/parquet/'
  DATA_FORMAT      = 'PARQUET'
  CONNECTION_NAME  = 'AZURE_BLOB_SECRET_CONNECTION'
  PARALLELISM      = 'iproc()';

Export using SAS token connection object

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_BLOB_SAS_CONNECTION'
  PARALLELISM     = 'iproc()';

For additional information, refer to Azure Blob Storage on the GitHub repository.

Azure Data Lake Storage (Generation 1)

To know about how to use connection objects with Azure Data Lake Storage (Gen 1), see Azure Data Lake Gen1 Storage. The examples for import and export using connection objects are given below:

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'adl://<AZURE_CONTAINER_NAME>.azuredatalakestore.net/import/avro/*'
  DATA_FORMAT     = 'AVRO'
  CONNECTION_NAME = 'AZURE_ADLS_CONNECTION'
  PARALLELISM     = 'nproc()';

Export

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'adl://<AZURE_CONTAINER_NAME>.azuredatalakestore.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_ADLS_CONNECTION'
  PARALLELISM     = 'iproc()';

Azure Data Lake Storage (Generation 2)

To know about how to use connection objects with Azure Data Lake Storage (Gen 2), see Azure Data Lake Gen 2 Storage. The examples for import and export using connection objects are given below:

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'abfs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.dfs.core.windows.net/import/orc/*'
  DATA_FORMAT     = 'ORC'
  CONNECTION_NAME = 'AZURE_ABFS_CONNECTION'
  PARALLELISM     = 'nproc()';

Export

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'abfss://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.dfs.core.windows.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_ABFS_CONNECTION'
  PARALLELISM     = 'iproc(), floor(random()*2)';

Delta Lake

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/import/delta/sales_positions/*'
  DATA_FORMAT     = 'DELTA'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION'
  PARALLELISM     = 'nproc()*2';

For detailed instructions on importing data from Delta Lake Format, refer to Delta Format on the GitHub repository.

Contribute to the Project

Exasol encourages your contribution to the open source project. To know about how to contribute to the project, see Information for Contributors.