Load data using Exasol Cloud Storage Extension

Exasol Cloud Storage Extension enables you to easily transfer formatted data between Exasol and cloud storage systems such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. To learn more, see Cloud Storage Extension on GitHub.

The following table lists the cloud storage systems and file types currently supported by the Cloud Storage Extension.

Cloud storage systems not listed in this table are not supported by the Cloud Storage Extension.

Storage system	Function	File format
Storage system	Function	Parquet	Delta Lake	ORC	Avro
Amazon S3	Import
Amazon S3	Export
Azure Blob Storage	Import
Azure Blob Storage	Export
Azure Data Lake (Gen 1)	Import
Azure Data Lake (Gen 1)	Export
Azure Data Lake (Gen 2)	Import
Azure Data Lake (Gen 2)	Export
Google Cloud Storage	Import
Google Cloud Storage	Export

	Supported
	Not supported

In Exasol 2025.1 and later you can use the IMPORT command to load Parquet files from AWS S3 buckets with Exasol’s native bulk loader (EXALoader). This method provides much better performance and flexibility when loading Parquet files compared to using the Cloud Storage Extension.

To learn more, see Load data from Apache Parquet files in Amazon S3 on AWS.

Setting up the UDFs

To set up the Cloud Storage Extension UDFs in an as-application deployment of Exasol, do the following:

Download the latest JAR files from Download.
Upload the JAR to a bucket in the BucketFS file system. For more information, see BucketFS.
Set up the ETL scripts using the procedures described in Deployment.

In Exasol SaaS the Cloud Storage Extension JAR is already uploaded to the BucketFS bucket /buckets/bfssaas/default/.

To identify which files are uploaded to this bucket, you can use the following script:

Copy

--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT "LS" ("my_path" VARCHAR(100)) EMITS ("FILES" VARCHAR(100)) AS
import os
def run(ctx):
    for line in os.listdir(ctx.my_path):
        ctx.emit(line)
/
 
SELECT ls('/buckets/bfssaas/default/');

In the Cloud Storage Extension User Guide, skip straight to Create UDF Scripts. When you create the UDFs, use a combination of the bucket name and file names. For example, the full path to the jar file might be /buckets/bfssaas/default/exasol-cloud-storage-extension-2.3.0.jar.

The version of Cloud Storage Extension stored in BucketFS may differ from the version shown in the User Guide on Github. If you need to update your version of the Cloud Storage Extension, create a support case.

Usage examples

In the following example, you can see an excerpt of Exasol UDFs and connection objects for importing and exporting Parquet formatted data with different types of cloud storage. For more information about the examples, see User Guide.

The user that runs the IMPORT or EXPORT statement needs the ACCESS privilege on the connection specified directly or via a role. For more details, see Privileges and Details on Rights Management.

We recommend using connection objects when providing credentials to UDFs to prevent secrets being visible in the audit logs.

Amazon S3

To create a connection object using AWS access and secret keys, use the following statement:

Copy

CREATE OR REPLACE CONNECTION S3_CONNECTION
TO ''
USER ''
IDENTIFIED BY 'S3_ACCESS_KEY=<AWS_ACCESS_KEY>;S3_SECRET_KEY=<AWS_SECRET_KEY>';

To learn more about how to use connection objects with Amazon S3, see Create Exasol Connection Object.

The examples for IMPORT and EXPORT using connection objects are given below:

Import

Copy

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/import/orc/sales_positions/*'
  DATA_FORMAT     = 'ORC'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION';

Export

Copy

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/export/parquet/sales_positions/'
  DATA_FORMAT     = 'PARQUET'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION';

For additional information, see Amazon S3.

Google Cloud Storage

Import

Copy


IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH      = 'gs://<GCS-STORAGE>/import/avro/sales_positions/*'
  DATA_FORMAT      = 'AVRO'
  GCS_PROJECT_ID   = '<GCP_PORJECT_ID>'
  GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET_NAME>/gcp-<PROJECT_ID>-service-keyfile.json';

Export

Copy

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH      = 'gs://<GCS-STORAGE>/export/parquet/sales_positions/'
  DATA_FORMAT      = 'PARQUET'
  GCS_PROJECT_ID   = '<GCP_PORJECT_ID>'
  GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET_NAME>/gcp-<PROJECT_ID>-service-keyfile.json';

For additional information, see Google Cloud Storage.

Azure Blob Storage

Import using secret key connection object

Copy

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH      = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/import/orc/*'
  DATA_FORMAT      = 'ORC'
  CONNECTION_NAME  = 'AZURE_BLOB_SECRET_CONNECTION';

Import using SAS token connection object

Copy

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/import/orc/*'
  DATA_FORMAT     = 'ORC'
  CONNECTION_NAME = 'AZURE_BLOB_SAS_CONNECTION';

Export using secret key connection object

Copy

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH      = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/export/parquet/'
  DATA_FORMAT      = 'PARQUET'
  CONNECTION_NAME  = 'AZURE_BLOB_SECRET_CONNECTION';

Export using SAS token connection object

Copy

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_BLOB_SAS_CONNECTION';

For additional information, see Azure Blob Storage.

Azure Data Lake Storage (Generation 1)

For more information about how to use connection objects with Azure Data Lake Storage (Gen 1), see Azure Data Lake Gen1 Storage.

Import

Copy

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'adl://<AZURE_CONTAINER_NAME>.azuredatalakestore.net/import/avro/*'
  DATA_FORMAT     = 'AVRO'
  CONNECTION_NAME = 'AZURE_ADLS_CONNECTION';

Export

Copy

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'adl://<AZURE_CONTAINER_NAME>.azuredatalakestore.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_ADLS_CONNECTION';

Azure Data Lake Storage (Generation 2)

For more information about how to use connection objects with Azure Data Lake Storage (Gen 2), see Azure Data Lake Gen 2 Storage.

Import

Copy

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'abfs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.dfs.core.windows.net/import/orc/*'
  DATA_FORMAT     = 'ORC'
  CONNECTION_NAME = 'AZURE_ABFS_CONNECTION';

Export

Copy

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'abfss://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.dfs.core.windows.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_ABFS_CONNECTION';

Delta Lake

Import

Copy

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/import/delta/sales_positions/*'
  DATA_FORMAT     = 'DELTA'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION';

For detailed instructions on importing data from Delta Lake Format, see Delta Format on the GitHub repository.

Hadoop Distributed Filesystem (HDFS)

The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file system written in Java for the Hadoop framework. When the Hadoop datanodes and Exasol cluster are installed in the same (virtual) network, you can access the HDFS using CLOUD_STORAGE_EXTENSION.

Since the Hadoop nodes and the Exasol cluster must be in the same private network, a connection object is not necessary.

Import

Copy

IMPORT INTO <schema>.<table>
FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
  BUCKET_PATH     = 'hdfs://<HDFS_PATH>/import/orc/data/*.orc'
  DATA_FORMAT     = 'ORC';

Export

Copy

EXPORT <schema>.<table>
INTO SCRIPT CLOUD_STORAGE_EXTENSION.EXPORT_PATH WITH
  BUCKET_PATH     = 'hdfs://<HDFS_PATH>/export/parquet/data/'
  DATA_FORMAT     = 'PARQUET';

Contribute to the project

If you want to contribute to the Cloud Storage Extension open source project, see Information for Contributors.

Load data using Exasol Cloud Storage Extension

Setting up the UDFs

Usage examples

Amazon S3

Import

Export

Google Cloud Storage

Import

Export

Azure Blob Storage

Import using secret key connection object

Import using SAS token connection object

Export using secret key connection object

Export using SAS token connection object

Azure Data Lake Storage (Generation 1)

Import

Export

Azure Data Lake Storage (Generation 2)

Import

Export

Delta Lake

Import

Hadoop Distributed Filesystem (HDFS)

Import

Export

Contribute to the project

PRODUCT

RESOURCES