Load Data from Other Files

In addition to CSV/FBV file types, you can also load data into Exasol from cloud storage systems. Exasol’s Cloud Storage Extension allows you to import and export formatted data between Exasol and cloud storage systems such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. You can use connection objects, or you can use the UDF parameters to import or export your data.

We recommend using connection objects to provide credentials to the UDFs and hide the secrets information from the audit logs.

The following table lists the supported file types and import/export functions for different cloud storage systems. 

Cloud storage services that are not listed in this table are not supported.

Storage System Function File Format
Parquet Delta Lake ORC Avro
Amazon S3 Import
Export
Azure Blob Storage Import
Export
Azure Data Lake (Gen 1) Import
Export
Azure Data Lake (Gen 2) Import
Export
Google Cloud Storage Import
Export

Supported function

Unsupported function

Setting up the UDFs

The Cloud Storage Extension JAR is already uploaded into the following bucket in BucketFS:

/buckets/bfssaas/default/

You can use the below script to identify the exact files which are uploaded to this bucket:

--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT "LS" ("my_path" VARCHAR(100)) EMITS ("FILES" VARCHAR(100)) AS
import os
def run(ctx):
    for line in os.listdir(ctx.my_path):
        ctx.emit(line)
/
 
SELECT ls('/buckets/bfssaas/default/');

When going through the User Guide, you can skip straight to Create UDF Scripts. Use a combination of the above bucket name and file names when creating the UDFs. For example, the full path to the jar file might be /buckets/bfssaas/default/exasol-cloud-storage-extension-2.3.0.jar.

The version of Cloud Storage Extension stored in BucketFS may differ from the version shown in the User Guide on Github. If you need to update your version of the Cloud Storage Extension, contact Exasol Support.

Usage examples

In the following example, you can see an excerpt of Exasol UDFs and connection objects for importing and exporting Parquet formatted data with different types of cloud storage. For more information about the examples, see User Guide.

The user that runs the IMPORT or EXPORT statement needs the ACCESS privilege on the connection specified directly or via a role. For more details, see Privileges and Details on Rights Management.

Amazon S3

You can use the following statement to create a connection object using AWS access and secret keys.

CREATE OR REPLACE CONNECTION S3_CONNECTION
TO ''
USER ''
IDENTIFIED BY 'S3_ACCESS_KEY=<AWS_ACCESS_KEY>;S3_SECRET_KEY=<AWS_SECRET_KEY>';

To learn more about how to use connection objects with Amazon S3, see Create Exasol Connection Object.

The examples for IMPORT and EXPORT using connection objects are given below:

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/import/orc/sales_positions/*'
  DATA_FORMAT     = 'ORC'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION';

Export

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/export/parquet/sales_positions/'
  DATA_FORMAT     = 'PARQUET'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION';

For additional information, see Amazon S3.

Google Cloud Storage

Import


IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH      = 'gs://<GCS-STORAGE>/import/avro/sales_positions/*'
  DATA_FORMAT      = 'AVRO'
  GCS_PROJECT_ID   = '<GCP_PORJECT_ID>'
  GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET_NAME>/gcp-<PROJECT_ID>-service-keyfile.json';

Export

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH      = 'gs://<GCS-STORAGE>/export/parquet/sales_positions/'
  DATA_FORMAT      = 'PARQUET'
  GCS_PROJECT_ID   = '<GCP_PORJECT_ID>'
  GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET_NAME>/gcp-<PROJECT_ID>-service-keyfile.json';

For additional information, see Google Cloud Storage.

Azure Blob Storage

Import using secret key connection object

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH      = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/import/orc/*'
  DATA_FORMAT      = 'ORC'
  CONNECTION_NAME  = 'AZURE_BLOB_SECRET_CONNECTION';

Import using SAS token connection object

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/import/orc/*'
  DATA_FORMAT     = 'ORC'
  CONNECTION_NAME = 'AZURE_BLOB_SAS_CONNECTION';

Export using secret key connection object

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH      = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/export/parquet/'
  DATA_FORMAT      = 'PARQUET'
  CONNECTION_NAME  = 'AZURE_BLOB_SECRET_CONNECTION';

Export using SAS token connection object

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'wasbs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.blob.core.windows.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_BLOB_SAS_CONNECTION';

For additional information, see Azure Blob Storage.

Azure Data Lake Storage (Generation 1)

For more information about how to use connection objects with Azure Data Lake Storage (Gen 1), see Azure Data Lake Gen1 Storage.

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'adl://<AZURE_CONTAINER_NAME>.azuredatalakestore.net/import/avro/*'
  DATA_FORMAT     = 'AVRO'
  CONNECTION_NAME = 'AZURE_ADLS_CONNECTION';

Export

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'adl://<AZURE_CONTAINER_NAME>.azuredatalakestore.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_ADLS_CONNECTION';

Azure Data Lake Storage (Generation 2)

For more information about how to use connection objects with Azure Data Lake Storage (Gen 2), see Azure Data Lake Gen 2 Storage.

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 'abfs://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.dfs.core.windows.net/import/orc/*'
  DATA_FORMAT     = 'ORC'
  CONNECTION_NAME = 'AZURE_ABFS_CONNECTION';

Export

EXPORT RETAIL.SALES_POSITIONS
INTO SCRIPT ETL.EXPORT_PATH WITH
  BUCKET_PATH     = 'abfss://<AZURE_CONTAINER_NAME>@<AZURE_ACCOUNT_NAME>.dfs.core.windows.net/export/parquet/'
  DATA_FORMAT     = 'PARQUET'
  CONNECTION_NAME = 'AZURE_ABFS_CONNECTION';

Delta Lake

Import

IMPORT INTO RETAIL.SALES_POSITIONS
FROM SCRIPT ETL.IMPORT_PATH WITH
  BUCKET_PATH     = 's3a://<BUCKET>/import/delta/sales_positions/*'
  DATA_FORMAT     = 'DELTA'
  S3_ENDPOINT     = 's3.<REGION>.amazonaws.com'
  CONNECTION_NAME = 'S3_CONNECTION';

For detailed instructions on importing data from Delta Lake Format, see Delta Format on the GitHub repository.

Hadoop Distributed Filesystem (HDFS)

The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file system written in Java for the Hadoop framework. When the Hadoop datanodes and Exasol cluster are installed in the same (virtual) network, you can access the HDFS using CLOUD_STORAGE_EXTENSION.

Since the Hadoop nodes and the Exasol cluster must be in the same private network, a connection object is not necessary.

Import

IMPORT INTO <schema>.<table>
FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
  BUCKET_PATH     = 'hdfs://<HDFS_PATH>/import/orc/data/*.orc'
  DATA_FORMAT     = 'ORC';

Export

EXPORT <schema>.<table>
INTO SCRIPT CLOUD_STORAGE_EXTENSION.EXPORT_PATH WITH
  BUCKET_PATH     = 'hdfs://<HDFS_PATH>/export/parquet/data/'
  DATA_FORMAT     = 'PARQUET';

Contribute to the project

Exasol encourages your contribution to the open source project. For informaiton about how to contribute, see Information for Contributors.