Install Exasol - step by step

This article explains how to install Exasol as an application on Linux hosts.

Overview

You can install Exasol as an application on Linux hosts running either on hardware or on VM instances on a cloud service such as AWS, Azure, or Google Cloud Platform (GCP). The system requirements and procedures for installation and administration are essentially the same regardless of which platform the Linux host is running on.

If your system will be hosted on a cloud service, refer to the documentation from the cloud service provider for information about how to configure network and security settings for your Exasol deployment on that platform.

The installation procedure can be broken down into three major stages: the Preparation stage, where you prepare the installation environment, the Installation stage, where you download the deployment tool and run the installation on the hosts, and the Post-Installation stage, where you can connect to the cluster, upload a license, and carry out any other additional actions to prepare the database for use.

The following diagram is a schematic overview of the installation procedure. You can click in the diagram to navigate directly to the corresponding section in the documentation.

installation workflow

You can run the installation process from a separate Linux system (jump host) or from one of the database host systems. In both cases the host running the installation will require access to the other hosts over SSH.

If you install Exasol on a single host system without using a jump host, SSH is not required for the installation process. We recommend setting up SSH anyway, as this enables passwordless login and makes it easier to extend the installation at a later stage by adding more hosts.

Prerequisites

The database hosts must be pre-installed with one of the recommended Linux distributions and meet the minimum system requirements for an Exasol installation. The following requirements must be checked:

  • The nodes must run on a supported CPU type

  • Storage devices must be of a supported type with RAID configuration as required

  • Each node must have enough storage space for installation and updates

  • The nodes must have a supported operating system version installed and configured as required

  • All software dependencies must be fulfilled for the database nodes (and optional jump host)

For details about the requirements for installing Exasol, see System Requirements.

The following procedure describes how to install Exasol with a user having root privileges on the database hosts. To install Exasol for a non-root user, additional configuration steps are required. For more information, see Rootless Installation.

Preparation

Step 1: Configure the network

IP addresses

The database hosts should be assigned private static IPv4 addresses in the same subnet. For example: 10.0.0.11, 10.0.0.12, 10.0.0.13, 10.0.0.14. DHCP can be used in the network if each host always receives the same IP address.

You can additionally configure a public network to allow direct access to the deployment from outside of the private network. A public network is optional and not required for installation.

For information about how to add private and public IP adresses to the configuration, see Step 6: Create a configuration file.

Internet connection

An internet connection from the jump host is recommended but not required. If the host running the installation is not connected to the internet, the software must be downloaded using another system and copied to the installation host.

Internet connectivity from the database hosts is not required.

Firewall configuration

To operate the Exasol database after installation, traffic must be allowed through the firewall on the following ports:

Service Default port Protocol
SQL client connections to the database 8563 TCP
SSH access to all cluster nodes 20002 TCP
HTTPS access to the Administration API 4444 TCP
NTP 123 TCP/UDP
DNS 53 TCP/UDP
LDAP (optional) 389 TCP/UDP

For more information about the default ports used by Exasol, see Firewall and Port Settings.

Step 2: Create the installation user

On each database host, create a dedicated user for the Exasol software installation. The user must have sudo privileges and a system shell that allows access over SSH. The name of the installation user can be set freely within the restrictions of the operating system, but must be identical on all hosts. In the following examples, the installation user is called exasol.

To install Exasol for a non-root user, additional configuration steps are required. For more information, see Rootless Installation.

In the following examples the installation user is created by a user that has sudo privileges. If you carry out this operation as root, the sudo command should be omitted.

  1. Create the user exasol with a home directory /home/user:

    sudo adduser -m exasol
  2. Add the user to the sudoers group:

    sudo usermod -aG sudo exasol
  3. Assign a password for the user:

    sudo passwd exasol
  4. Log out (or create a new session) and log in as the user exasol.

  5. Verify that the user has sudo privileges using sudo whoami. The command should return root.

    sudo whoami
    root
  6. Repeat the above steps on all the database hosts.

Step 3: Set up SSH authentication

If you install Exasol 8 on a single host system without using a jump host, SSH is not required for the installation process.

Generate an SSH key pair

  1. On the host that will run the installation, open a terminal or command prompt.

  2. Run the command ssh-keygen -t rsa to generate a new RSA key pair.

  3. Choose a location to save the key pair (the default location is normally ~/.ssh/id_rsa).

For added security you can provide a passphrase for the keypair.

Copy the public key to the remote hosts

  1. Run the command ssh-copy-id <username>@<remote-node-ip>. Replace <username> and <remote-node-ip> with the actual username of the installation user and the IP address of the remote host.

    Enter the password for the user on the remote host when prompted.

  2. Repeat this step for each host by substituting <remote-node-ip> with the IP address of the respective host. The user should be the same on all hosts.

Test the key-based authentication

  1. Run the command ssh <user>@<remote-node-ip> to initiate an SSH connection to a remote host.

    If everything is correctly configured, you should be able to log in without entering a password.

  2. Repeat this step for each remote host to verify that all nodes can be accessed during the installation.

Step 4: Prepare storage devices

Block storage

Data in an Exasol database is stored on volumes, which are assigned to storage devices (disks). The storage devices must be prepared before you continue with the installation.

In the following configuration step (Step 6: Create a configuration file), you specify the disks in the parameter CCC_HOST_DATADISK. The installation process will then automatically create the necessary volumes on those disks. You can create additional volumes manually after the installation if needed. For more information, see Storage Management.

Do not create a file system on the storage devices. The database stores persistent data using block storage with a specific structure for Exasol databases.

Do not use the device where the operating system resides for database storage, as this could potentially lead to a system crash if the device runs our of disk space.

The naming and order of the block storage devices must be identical on all nodes.

Supported storage types/technologies

  • Sparse file devices hosted on a filesystem like ext4 or XFS (NFS is not supported)
  • Block devices (local storage SAS, SSD, NVMe, virtual disks, or remote storage iSCSI/SAN)
  • LVM2
  • LUKS

Storage device requirements

  • Use at least 4 storage drives with minimum 250 MBps read/write capacity per drive.

    Actual performance depends on the number of disks used as well as the speed of the individual disks.

  • OS and storage disks should have RAID 1 (or similar fault tolerance).

  • OS disks must have at least 150 GiB free disk space after installation .

  • Swap partition - use the size recommended by the OS vendor.

For information about how to calculate the required size for the storage devices, see Sizing Considerations.

Installation directory

Exasol is installed in the home directory of the installation user on each database node. For example, if the username is exasol, Exasol is installed under /home/exasol/.

The partition where the home directory of the installation user is mounted must have at least 20 GiB free space available for the installation.

Logical volume manager

We recommend using a logical volume manager to manage the storage devices used for block storage. To check if an existing system is using a logical volume manager, use the command lsblk to list all block devices. If the output shows devices that are named sd* (sda, sdb, sdc, and so on), the system is not using a logical volume manager.

For example:

# this system is not using a volume manager:
lsblk -p
NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
/dev/sda   8:0    0 388.5M  1 disk
/dev/sdb   8:16   0     4G  0 disk [SWAP]
/dev/sdc   8:32   0   256G  0 disk /mnt/wslg/distro

# this system is using a logical volume manager:
lsblk -p
NAME              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
/dev/loop1          7:1    0 91.9M  1 loop
/dev/nvme0n1      259:0    0  100G  0 disk
|-/dev/nvme0n1p1  259:4    0 99.9G  0 part /etc/ssh
|-/dev/nvme0n1p14 259:5    0    4M  0 part
`-/dev/nvme0n1p15 259:6    0  106M  0 part
/dev/nvme3n1      259:1    0   50G  0 disk

If you install Exasol for a non-root user, the data disks must be writeable for that user and additional configuration of the storage devices may be required. For more information, see Rootless Installation.

Installation

Step 5: Download and install c4

Exasol Deployment Tool (c4) is a command-line application that is used to install Exasol on a single host or multiple hosts in a network. The c4 application can run either on a separate system (jump host) or on one of the database hosts.

Database and c4 version dependency

The version of c4 used for installation must be compatible with the Exasol database version that you install. The latest version of the database is always compatible with the latest version of c4.

If you want to install an earlier version of Exasol, refer to the following table to find out which version of c4 to use (click on the title to expand):

Alternative 1: Installation host has internet access

  • On the installation host, download c4 from the Exasol Download Portal. You can also use the following command in a Linux terminal:

    wget https://x-up.s3.amazonaws.com/releases/c4/linux/x86_64/<version>/c4 -O c4 && chmod +x c4

    Replace <version> with the desired c4 version, for example, 4.20.0.

    For information about the latest release of Exasol Deployment Tool (c4), see c4 Release Notes.

    For more information about how to install and use c4, see Exasol Deployment Tool (c4).

  • The Exasol 8 installation package will be downloaded by c4 during the installation process.

Alternative 2: Installation host does not have internet access

If the host used to run the installation is not connected to the internet, download the resources using another machine and then copy them to the installation host.

  1. On a machine with internet access, download c4 from the Exasol Download Portal or using the command line. For more information, see Install c4.

  2. Download the latest version of Exasol from the Exasol Download Portal.

  3. Copy c4 and the Exasol installation package to the same directory on the host that will run the installation.

You must make the c4 binary executable for all users on the host by using chmod +x c4. Otherwise the application will not be able to run the installation. For more information, see Install c4.

Step 6: Create a configuration file

On the host that will be used to run the installation, create a file with the filename config in the same directory as the c4 binary. For example, if you are using the nano text editor:

nano ./config

The configuration file should define the following parameters:

Parameter Data type Default Description
CCC_HOST_ADDRS string [empty]

The IP addresses of the database hosts on the private network, separated by spaces.

The number of addresses in this parameter determines the number of nodes that will be installed.

CCC_HOST_EXTERNAL_ADDRS

(optional)

string [empty]

Public IP addresses of the database hosts, separated by spaces.

This parameter is optional and only needed if you want to allow direct access to the deployment from outside of the private network.

CCC_HOST_DATADISK string [empty]

Comma-separated list of block devices to be used for the data volume. If block devices are not specified in this parameter, limited file-based storage is used.

To find the names of the available block devices on the host, use the command lsblk (see Logical volume manager)

The devices used should have persistent block device names. Exasol recommends using volume management with LVM2. See also System Requirements.

CCC_HOST_IMAGE_USER string [empty]

Username that will be used to log in to the SSH instances.

The user must have sudo privileges on the instances.

CCC_HOST_IMAGE_PASSWORD string [empty]

Password for the user if required for sudo.

The password is passed in plaintext to the instances.

CCC_HOST_KEY_PAIR_FILE string [empty] Name of the file that contains the private SSH key required to access host instances.
CCC_PLAY_WORKING_COPY string [empty]

Specifies the Exasol package to install, using the format @exasol-<version>.

For example: @exasol-8.32.0

CCC_PLAY_DB_PASSWORD string aX1234567

Password for the database sys user.

CCC_PLAY_ROOTLESS boolean false

Use rootless deployment mode (OPTIONAL).

Rootless installation requires additional system configuration. For more information, see Rootless Installation.

CCC_PLAY_ADMIN_PASSWORD string aX1234567 Password for the system administration user admin in COS.
CCC_PLAY_RESERVE_NODES (optional)

integer

[empty]

The number of hosts to use as reserve nodes.

Reserve nodes are inactive nodes that can automatically take over from an active node in case of failure. For more information about the failover mechanism, see Fail Safety (On-Prem).

The reserve nodes are part of the total number of nodes. For example, deploying with 4 nodes and CCC_PLAY_RESERVE_NODES=1 results in a database with 3 active nodes and one reserve node.

The username for the installation user can be any name allowed by the operating system. In the following examples, the user has the username exasol.

Example configuration

The following configuration file will result in a deployment with 3 database nodes and one reserve node.

CCC_HOST_ADDRS="10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14"
CCC_HOST_EXTERNAL_ADDRS="203.0.113.11 203.0.113.12 203.0.113.13 203.0.113.14"
CCC_HOST_DATADISK=/dev/mapper/exasol_disk_1,/dev/mapper/exasol_disk_2
CCC_HOST_IMAGE_USER=exasol
CCC_HOST_IMAGE_PASSWORD=exasol123
CCC_HOST_KEY_PAIR_FILE=id_rsa
CCC_PLAY_WORKING_COPY=@exasol-8.32.0
CCC_PLAY_DB_PASSWORD=exasol456
CCC_PLAY_RESERVE_NODES=1

Always replace the default passwords by setting unique, secure passwords in your configuration file. Never use the passwords that are used in the examples in this documentation.

Run diagnostic tool (optional)

Before you start the installation you can run a diagnostic tool on your configuration. By using this tool you can detect issues before starting the installation. The diagnostic tool will check things like ssh accessibility to the hosts, sudo password correctness (if the CCC_HOST_IMAGE_PASSWORD parameter is set), missing required parameters, etc.

To run the diagnostic tool, use c4 host diag -i <path to configuration>. For example:

./c4 host diag -i /path_to_config_file/myconfig
OK check_disks
OK check_external_dependencies
OK check_internal_dependencies
OK check_required_params
OK check_sudo

For more information about the diagnostic tool, use c4 host diag --help.

Step 7: Deploy to hosts

  1. On the installation host, run the following command:

    ./c4 host play -i config

    The -i option tells c4 to use a specific configuration file. By default, c4 reads the configuration from the configuration file ./config (in the current directory). If the configuration is stored in another file, specify the path to this file as an argument on the command line:

    ./c4 host play -i /path_to_config_file/myconfig

    Rootless install

    If you are installing Exasol for a non-root user, you must add --ccc-play-rootless true to the command:

    ./c4 --ccc-play-rootless true host play -i config

    If you install Exasol for a non-root user, some additional configuration steps are required. For more information, see Rootless Installation.

  2. If the configuration is valid, c4 will show the parameter values that will be used and ask you to either proceed with this configuration or cancel the installation.

    |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

      Exasol installation procedure is about to be started.
      During this procedure, Exasol software will be installed to remote hosts.
      It will take some time (several minutes).
      The installation is finished when every node reaches stage 'd' (see 'c4 ps').
      After the installation is finished, you can connect to the Database or COS.

      During the installation, you can login to the hosts via SSH,
      and watch the process using:

        sudo journalctl -f

      After the installation finished, you can connect to COS using:

        ssh -p 20002 root@$IP

      IP addresses of the systems:

        * 203.0.113.11
        * 203.0.113.12
        * 203.0.113.13
        * 203.0.113.14

      Exasol version: 8.32.0
      Exasol package: @exasol-8.32.0
      SSH username  : exasol
      SSH keyfile   : id_rsa
      User password : exasol123
      Data disk(s)  : /dev/mapper/exasol_disk_1,/dev/mapper/exasol_disk_2

      Press ENTER to proceed or Ctrl-C to cancel the installation procedure.

    |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  3. Press Enter to start the installation.

    • If the necessary installation packages are present in the current directory they will be used for the installation. In this case no internet connection is required.

    • If the installation packages are not found in the current directory and the host used to run the installation is connected to the internet, c4 will automatically download the necessary packages from the Exasol download portal.

    • If no installation packages are found and c4 is not able to connect to the Exasol download portal, the installation process will be aborted and no changes are made to the system.

      To troubleshoot a failed installation, first make sure that all the steps above have been carried out correctly and that all system requirements are met. To get help from our Support team, create a case. .

    If the packages are available, the installation process will start. The installation requires no user intervention and can be run unattended. It comprises the following steps:

    • Copying Exasol packages to the hosts
    • Initial OS preparation
    • Verifying OS configuration
    • Configuring OS on the hosts
    • Extracting packages
    • Installing c4 on the hosts
    • Syncing time between hosts
    • Triggering remote installation finalization

    The installation will typically require 20 to 90 minutes to complete, depending on the number of hosts and the location of the installation files. This however depends on many factors, and the installation may take longer.

  4. When the installation has finished, a confirmation message is shown:

    |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

      The final steps of the Exasol installation procedure were successfully
      started on remote hosts now.
      It will take yet some time to complete (several minutes).
      After the installation is finished, you can connect to the Database or COS.

      During the installation, you can login to the hosts via SSH,
      and watch the process using:

        sudo journalctl -f

      After the installation finished, you can connect to COS using:

        ssh -p 20002 root@$IP

      IP addresses of the systems:

        * 203.0.113.11
        * 203.0.113.12
        * 203.0.113.13
        * 203.0.113.14

      Exasol version: 8.32.0
      Exasol package: @exasol-8.32.0

      Happy Exasolling!

    |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Monitoring the installation process

To monitor the installation process, connect to one of the nodes over SSH and use the c4 ps command. The output shows details about each node, including the current deployment stage. The installation is finished when all database nodes have reached stage d.

For more details about the c4 ps command, see How to use c4.

Example
ssh -i KEY_FILE user@203.0.113.11
...
user@ip-10-0-0-11:~$ ./c4 ps
      N  PLAY_ID   NODE  MEDIUM  INSTANCE     DB_VERSION  EXTERNAL_IP     INTERNAL_IP  STAGE  STATE    UPTIME    TTL
  ┌─  1  c3275f84  11    host    -            8.32.0      203.0.113.11    10.0.0.11    d      -        03:50:12  +∞
  │   1  c3275f84  12    host    -            8.32.0      203.0.113.12    10.0.0.12    d      -        03:50:13  +∞
  │   1  c3275f84  13    host    -            8.32.0      203.0.113.13    10.0.0.13    d      -        03:50:13  +∞
  └─  1  c3275f84  14    host    -            8.32.0      203.0.113.14    10.0.0.14    d      -        03:50:13  +∞
      2  c3275f84  14    local   -            8.32.0      -               10.0.0.14    d      -        03:50:13  +∞

To get more detailed information about the installation process, connect to one of the nodes over SSH and use the command sudo journalctl -fto monitor the installation in real time.

Errors during installation

If an error occurs during the installation process, the process is automatically interrupted and the installation is rolled back. The last INFO message on the screen provides information about the error. For example:

INFO[2023-09-02 09:22:44]: Extracting packages...
cp: cannot create regular file '/var/lib/ccc/bin/': No such file or directory

In this example, an error happened during the packages extraction step. When you have located and resolved the error, run the installation script again.

If you need help with the installation process, create a support case.

Post-installation

When the installation has completed, you can connect to the database using a SQL client, access the nodes over SSH, and use the different administrative interfaces to carry out additional configuration tasks.

Upload a license

The new Exasol system is installed with a license that allows you to load 10 GiB of raw data for testing purposes. For larger data sizes, you must upload a license to the database.

For information about how to upload a license, see Upload a License.

Connect to Exasol

Once the database is up and running you can connect to it using a database client and start to load data. To create the connection, use the following details:

Hostname

Comma-separated list of public IP addresses of the active database nodes.

For example: 203.0.113.11,203.0.113.12,203.0.113.13

The public IP addresses of the nodes are shown in the EXTERNAL_IP column in c4 ps.

Port

Value of the CCC_PLAY_DB_PORT parameter in the configuration that was used to create the deployment.

Default: 8563.

Username sys
Password

Value of the CCC_PLAY_DB_PASSWORD parameter in the configuration that was used to create the deployment.

Default: aX1234567.

If the connection uses a TLS certificate and a valid certification path is not found, you may have to provide the certificate fingerprint. For more information, refer to the documentation for the database client.

For more information about how to connect to Exasol and load data, see the following sections:

Configure backups

Backups can be scheduled or created manually. Backups are run in the background while the database is running, and the process normally has a minimal impact on database performance. How long the backup process takes depends on several factors, such as the size of the database and the location of the archive volume.

You can only create a backup of the entire database, not of individual schemas or tables. The backup contains the consistent state of the database at the time when the backup was started, and it includes only completed transactions that were committed at that time.

Backups are stored on archive volumes in a compressed format. The archive volumes can be configured locally within the cluster (local archive), or on a location outside of the cluster (remote archive).

The following example explains how to create a local archive volume and a backup schedule.

The following examples use ConfD through the command-line tool confd_client, which is available on all database nodes. For more information, see ConfD.

Create a local archive volume

  1. On one of the nodes in the deployment, use c4 connect -t <DEPLOYMENT>[.<NODE]/cos to connect to EXAClusterOS (COS). For example:

    ./c4 connect -t 1/cos

    In most cases it does not matter which node you connect to. If you do not specify a node, c4 will connect to the first active node in the deployment. The command prompt in COS indicates which node you are connected to. For example, if you are connected as root to node 11:

    [root@n11 ~]#

    For more information about how to use c4 connect, see How to use c4.

  2. To create a local archive volume, use the ConfD job st_volume_create with the parameters described in the following table.

    Some parameters values for the new archive volume must match the corresponding values for the data volume. To find out the values used by the data volume, use the ConfD jobs db_info and st_node_list.

    Required parameters

    Parameter name

    Data type

    Value

    disk

    string

    The disk name in Exasol for the storage disk where the data volume resides.

    owner

    tuple, list

    Owner tuple (or list of tuples) for the data volume.

    nodes

    list

    List of node IDs in the data volume.

    num_master_nodes integer

    The number of master nodes (active nodes) in the data volume.

         

    name

    string

    A name for the new archive volume.

    redundancy

    integer

    The redundancy level of the archive volume.

    size

    string

    Volume size for the archive volume as a string, with unit (MiB, GiB, or TiB).

    The size value depends on the database size and the backup schedule.

    partition_size

    string, integer

    4294967296 for volumes <250 GiB

    34359738368 for volumes ≥ 250 GiB and <1TiB

    274877906944 for volumes ≥ 1 TiB

    shared

    boolean

    true

    type

    string

    archive

    block_size string 512 KiB
    stripe_size string 512 KiB

    Master nodes

    The parameter num_master_nodes defines the number of master nodes that the volume will use. The number of master nodes must match the number of active nodes in the cluster. For example: in a cluster with 3 active nodes and 1 reserve node (3+1), the number of master nodes is 3.

    In the following example, we create a 1 TiB data volume with the name LocalArchiveVolume1 on the storage disk disk1 with 3 master nodes and redundancy 2. The command returns the volume ID for the new volume (vid: 3).

    confd_client st_volume_create name: LocalArchiveVolume1 disk: disk1 type: archive size: "1 TiB" num_master_nodes: 3 nodes: [11, 12, 13] redundancy: 2 partition_size: 274877906944 shared: true, owner: [500,500]
    # ConfD returns the volume ID of the new archive volume:
    vid: 3

    The ConfD job st_volume_create does not necessarily use the specified size, but does internal rounding. To check the actual size of the archive volume after creation to see if it is acceptable, use the ConfD job st_volume_info. If the rounding takes up too much space, contact Support.

  3. To create the backup schedule, use the ConfD job db_backup_add_schedule.

    If a local archive volume runs out of free space, expired backups will be automatically deleted. Expired remote archive volumes will not be deleted by this function.

    A common backup schedule is a weekly backup with an expiration of 10 days, and incremental backups on the first 6 days of the week with an expiration time of 3 days. To set up this configuration, create two backup schedules. For example:

    confd_client db_backup_add_schedule db_name: MY_DATABASE backup_name: weekly_full_backup backup_volume_name: VOLUME_NAME enabled: true level: expire: '1w 3d'  minute: 0 hour: 0 day: '*' month: '*' weekday: 0
    confd_client db_backup_add_schedule db_name: MY_DATABASE backup_name: daily_incremental backup_volume_name: VOLUME_NAME enabled: true level: 1 expire: '3d'  minute: 0 hour: 0 day: '*' month: '*' weekday: '1,2,3,4,5,6'

For more information about how to create archive volumes, see Create Local Archive Volume and Create Remote Archive Volume.

For more information about backups, see Backup and Restore.

Additional administrative tasks

You can use the Administration API and ConfD to carry out additional administrative tasks after the installation, such as setting up a backup schedule and managing access.

For more information, see the respective topics in the Administration (On-Prem) section.