Load Data using Keboola

This article explains how to use Keboola to load data into Exasol SaaS.

For general information about Keboola concepts and procedures, see the Keboola documentation.

What is Keboola?

Keboola Connection is a cloud based data operations platform that allows you to quickly build ETL/ELT pipelines using a graphical user interface. Exasol SaaS has built-in functionality for connecting to Keboola directly from the Databases page. You can use Keboola to load data from most data sources into an Exasol SaaS database.

Keboola is a third party vendor with separate user accounts and billing.

For more information about Keboola, refer to the official Keboola documentation.

Keboola Integration with Exasol SaaS

Keboola connects data sources with data consumers, manages transformations, and stores data in buckets that correspond to a schema in the database. The source and target databases are not part of Keboola. Keboola will however have administrative access to the database serving as storage.

When you set up a connection between Keboola and Exasol SaaS, a new technical user “KEBOOLA” with DBA privileges is created in the Exasol SaaS database. The credentials are sent securely to Keboola. This is necessary to allow Keboola to create and manage users and objects in the database. Users and roles that are created by Keboola are always prefixed “KEBOOLA_”.

The user management in Keboola is separate from the user management in Exasol SaaS. You need to invite users explicitly to your Keboola project, they are not added automatically.

Keboola Connection Storage is the central subsystem that manages everything related to storing and accessing data in Keboola. Exasol serves as a database backend for Storage as well as a data destination. Keboola creates schemas as required for each bucket in Storage.

Do not modify users, roles or schemas created by Keboola, as this will make the connection invalid.

Security Considerations

Keep the following in mind when you connect Keboola to your cluster for the first time. If you stop using Keboola, we recommend that you reverse these changes.

  • Keboola will be granted admin privileges on the Exasol SaaS database by creating a user “KEBOOLA” with the role “DBA”.

  • To allow Keboola to connect to your database, the following IP Addresses are automatically added to the Allow List:

    • 35.157.170.229

    • 35.157.93.175

    • 149.72.196.5

    • 3.66.248.180

    • 3.64.150.30

    • 35.157.62.225

    • 3.71.156.204

Limitations

  • A Keboola organization account can only connect to one Exasol SaaS database.

  • A Keboola connection can only use one Exasol SaaS cluster. Multiple clusters are not supported.

  • Deleting the cluster will permanently disable the Keboola connection.

Connecting to Keboola

  1. In Exasol SaaS, create a new database and cluster. See Add Database for more information.

  2. When the database is running, click Moreon the cluster and select Connect to Keboola. This will open the connection wizard on the Keboola site in a new browser tab.

    Connect to Keboola

  3. Follow the instructions in the Keboola connection wizard to either create a new Keboola account or to log in to an existing account.

    You need to allow Keboola administrative privileges on the database to continue. See Security Considerations for more information.

    Connect Exasol

    Create account

  4. When the setup process has completed successfully, you will be taken to the Keboola Dashboard.

  5. In Exasol SaaS, the dropdown menu on the cluster will indicate that the database is connected to Keboola.

    Connected to Keboola

    You cannot disable or remove an existing Keboola connection. The only way to remove the connection to Keboola is to delete the cluster.

Using Keboola with Exasol SaaS

The Keboola Connection platform has three basic components: extracting data from a source, manipulating data using transformations, and writing data to a destination. Configuring a Keboola connection with Exasol SaaS therefore consists of three steps:

  1. Configuring the data source

    Keboola comes with a large collection of pre-defined data source integrations. You select a data source such as Amazon S3 and configure the necessary credentials and source data to extract. The extracted data will be loaded into the Storage component of Keboola. In our case, Keboola will automatically create a schema with the corresponding tables in Exasol SaaS for this purpose. The schema that is used as the Storage component is managed by Keboola.

  2. Configuring Transformations

    A Transformation in Keboola consists of a table mapping part, where you map the tables that Keboola created in the Storage schema (input tables) to tables that you define in the Destination schema (output tables), and a query part, where you add the desired SQL statements. When the transformation is run, it will automatically create the output tables that you defined.

  3. Configuring the data destination

    You can select Exasol SaaS as a data destination from a list of pre-defined integrations in Keboola, just like when selecting a data source. You will be asked to provide credentials for the Exasol SaaS database and to specify a schema as the destination for your data. This schema is not created by Keboola, you need to define it in Exasol SaaS before you specify it in Keboola. You then select the desired tables from Storage and define datatypes for the corresponding columns in the destination tables.

In each step, you will first configure the component and then run it to produce output for the next step. You can combine the steps in a Keboola Orchestration that can be run as a single operation, either manually or automatically as a scheduled job.

The below video shows you how to perform all these steps and load data into a database using Keboola.

Keboola Workflow Overview

Workflow

Exasol Cluster Starter

The Exasol Cluster Starter application component in Keboola can be used to start or stop an Exasol SaaS cluster in a Keboola configuration. This can for example be used to start and stop the cluster programmatically on a time schedule to save costs and to make sure your cluster is up and running before you start your ETL jobs.

When you add the component, you will be asked to provide credentials for the Exasol SaaS database and to choose whether the component should start or stop the cluster.

Exasol Cluster Starter component

Exasol Cluster Starter