Lakehouse Turbo with Apache Iceberg

This article explains how to connect Lakehouse Turbo to an Apache Iceberg catalog.

By default, Lakehouse Turbo connects to Databricks. To use Iceberg catalogs you must change the default source system using the REST API. This article describes how to change between Iceberg and Databricks as the default source system.

For more information about the REST API, see Lakehouse Turbo REST API.

Authentication

To use the REST API, you must first generate a personal access token (PAT). Personal access tokens function like regular OAuth access tokens. When interacting with the REST API on the command line or with an API client, you must add the token in the request header.

  1. To create a token, run the following command in a terminal.

    Replace <admin_ui_ip> and <admin_ui_password> with the IP address and password that you configured for Exasol Admin.

    Copy
    curl -k -X POST "https://<admin_ui_ip>:8443/api/v1/token" /
    -H "Content-Type: application/x-www-form-urlencoded" /
    -d "grant_type=password&username=admin&password=<admin_ui_password>"
  2. Get the deployment ID.

    Replace token with the token you generated in the previous step.

    Copy
    curl -k https://<admin_ui_ip>:8443/api/v1/deployments -H "Authorization: Bearer <token>"
  3. Get the database ID.

    Replace <deployment_id> with the deployment ID you retrieved in the previous step.

    Copy
    curl -k https://<admin_ui_ip>:8443/api/v1/deployments/<deployment_id>/databases -H "Authorization: Bearer <token>"

Select Iceberg as source system

To select Iceberg as the default source system, use the following REST call. Replace the <placeholders> in the example with your own connection details.

Copy
$ curl -X PUT "https://<admin_ui_ip>:8443/api/v1/dlhc/databases/<database_id>/config" -H "accept: application/json" -H "Authorization: Bearer <my_personal_access_token>"
{
    // Switch the default source system from Databricks to Iceberg.
    "sourceSystemType": "ICEBERG",
    
    "iceberg": {
        // Iceberg REST catalog endpoint.
        // Example: https://lakekeeper:8181/catalog
        "restCatalogUri": "<rest-catalog-uri>",
        
        // Logical warehouse name/location expected by the REST catalog.
        // Examples: "s3://my-warehouse", "warehouse", "polaris".
        "warehouseLocation": "<warehouse-location>",
        
        // Supported values: "OAUTH2", "TOKEN", "NONE".
        "authType": "<auth-type>",
        
        // Credentials required for authType = "OAUTH2".
        "oauthClientId": "<client-id>",
        "oauthClientSecret": "<client-secret>",
        
        // Optional OAuth parameters, depending on your catalog/auth server.
        // See https://iceberg.apache.org/docs/latest/catalog-properties/#oauth2-auth-properties
        "oauthServerUri": "<oauth-server-uri>",
        "oauthScope": "<oauth-scope>",
        "oauthAudience": "<oauth-audience>",
        "oauthResource": "<oauth-resource>",
        "tokenExchangeEnabled": false, // true or false
        
        // Only use this for authType = "token"; omit/null for oauth2.
        "token": "<token>"
    },
    
    "dataLake": {
        // How DLHC gets credentials for table data files.
        // Supported values: "VENDED", "AWS".
        "credentialType": "AWS",
        
        // Optional: set when DLHC must assume a customer role to access S3.
        "customerRoleArn": "arn:aws:iam::<customer_role_arn>:role/dlhc-iceberg-read-role"
    }
}

Select Databricks as source system

To change back to Databricks as the default source system, use the following REST call. Replace the <placeholders> in the example with your own connection details.

Copy
$ curl -X PUT "https://<admin_ui_ip>:8443/api/v1/dlhc/databases/<database_id>/config" -H "accept: application/json" -H "Authorization: Bearer <my_personal_access_token>"
{
    // Switch the default source system to Databricks.
    "sourceSystemType": "DATABRICKS",
    
    "databricks": {
        // Databricks workspace URL, including https://.
        "workspaceHostUrl": "https://<databricks_host>.cloud.databricks.com",
        
        // true = machine-to-machine OAuth; false = personal access token.
        "useOAuth": true,
        
        // OAuth M2M credentials, required when useOAuth = true.
        "m2mOauthClientId": "<client-id>",
        "m2mOauthClientSecret": "<client-secret>",
        
        // Only use this when useOAuth = false; omit/null for OAuth.
        "personalAccessToken": null
    },
    
    "dataLake": {
        // Supported values: "VENDED", "AWS", "AZURE".
        "credentialType": "AWS",
        
        // Optional: set when DLHC must assume a customer role to access S3 data.
        "customerRoleArn": "arn:aws:iam::<customer_role_arn>:role/dlhc-databricks-read-role"
        
        // For Azure
        "azureAccountKey": "<azure-account-key>"
    }
}