Create Remote Archive Volume

This article explains how to create a remote archive volume for database backups.

Backups are stored on archive volumes in a compressed format. With on-premises installations of Exasol you can create archive volumes either locally within the cluster or in a location outside of the cluster, which in Exasol is referred to as a remote volume. For cloud deployments, only remote archive volumes are supported.

You can create a remote archive volume on cloud platforms such as Amazon S3, Azure Blob Storage, or Google Cloud Storage, or using other remote storage services such as an FTP server or a Samba file server. The basic procedure when creating a remote volume is essentially the same regardless of which platform is used.

This procedure can be carried out using the Administration API or ConfD.

Prerequisites

General prerequisites

  • All nodes must be able to reach the remote target.
  • The user must have read-write access on the remote host.

Amazon S3 prerequisites

  • The URL to an existing Amazon S3 bucket in the following format:

    http[s]://<bucketname>.s3[-<region>].amazonaws.com/[<optional-directory>/]

    If you do not have a bucket, see How Do I Create an S3 Bucket?.

    A fully qualified S3 URL in the format <bucket-name>.s3.<region-code>.amazonaws.com will become available immediately when you have created the bucket. A URL in the legacy global endpoint format <bucket-name>.s3.amazonaws.com may need up to 24 hours to become available.

  • Read-write access to the S3 bucket. If the nodes are on a private network, make sure that an S3 endpoint is configured for your VPC and that the route table for your subnet is updated accordingly to store backups in the bucket. For more information, see Endpoints for Amazon S3.
  • A secret access key for the S3 bucket. If you do not have a key, see Managing Access Keys for IAM Users.

Azure Blob Storage prerequisites

  • The URL to an existing Azure Blob Storage container in the following format:

    http[s]://storage_container_name.blob.corewindows.net/container_name

    If you do not have a container, see Create a storage account and Create a container.

  • Read-write access to the Blob storage. If the nodes are on a private network, make sure that a service endpoint for Microsoft Storage is configured in your VNet to store backups in the container. For more information, see Virtual Network service endpoints.

  • An access key for the Blob storage account. For more information, see Authorize with shared key.

Google Cloud Storage prerequisites

  • The URL to a Google Cloud Storage bucket in the following format:

    http[s]://<bucketname>.storage.googleapis.com

    If you do not have a bucket, see Creating storage buckets.

  • An access key for your Cloud Storage account. For more information, see Cloud Storage authentication.

  • If the nodes are on a private network, make sure that Private Google Access is enabled in your subnet to store backups in the storage. For more information, see Configuring Private Google Access.

Other remote archive volume options

For other storage platforms you need to set up authentication and configuration options as required by the respective host. The following protocols and volume types are supported:

Service Volume type URL example
FTP/FTPS ftp

ftp[s]://<ftpserver>:2021/optional-directory/

Samba smb smb:////<smbserver>:2139/optional-directory/
Apache Hadoop webhdfs

http://<hadoop-server>:2080/optional-directory/

https://<hadoop-server>:20443/optional-directory/

WebDAV webdav http[s]://<webdav-server>/optional-directory/
File file ./directory/

The ports in the URL examples are the default ports. Most protocols allow you to set the port.

Procedure - Administration API

The following examples use curl in a Linux terminal to send REST calls to endpoints in the Administration API. You can also use other interfaces and languages to interact with the API. For more information, see Administration API.

Placeholder values are styled as Bash variables, for example: $NODE_IP. Replace the placeholders with your own values.

The option --insecure or -k tells curl to bypass the TLS certificate check. This option allows you to connect to a HTTPS server that does not have a valid certificate. Only use this option if certificate verification is not possible and you trust the server.

To create a remote archive volume, send a PUT request to the /api/v1/databases/$DB_NAME/volumes endpoint.

Specify the url to the bucket (url), the volume name (volumeName), and the volume type (volumeType) as a part of the request.

For example, to create the remote volume RemoteArchiveVolume1 in the S3 bucket my_bucket:

curl --insecure -X 'POST' \
  'https://$EXASOL_IP:4444/api/v1/databases/$DB_NAME/volumes' \
  -H 'accept: application/json' \
  -H 'Authorization: Basic $TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "url": "https://my_bucket.s3.eu-west-1.amazonaws.com",
  "volumeName": "RemoteArchiveVolume1",
  "volumeType": "s3"
  "username": "$BUCKET_USER"
  "password": "$BUCKET_PASSWORD"
}'

For other storage options, use the appropriate url and volumeType. For example:

The ports in the URL examples are the default ports. Most protocols allow you to set the port.

Verification

To verify that the new remote archive volume was created, send a GET request to the /api/v1/databases/$DB_NAME/volumes endpoint. For example:

curl --insecure -X 'GET' \
  'https://$EXASOL_IP:4444/api/v1/databases/MY_DATABASE/volumes' \
  -H 'accept: application/json' \
  -H 'Authorization: Basic $TOKEN' 

The output should be similar to the following (in this example, using an S3 bucket):

[
  {
    "id": "10003",
    "name": "RemoteArchiveVolume1",
    "type": "s3",
    "owner": [
      500,
      500
    ],
    "url": "https://my_bucket.s3.eu-west-1.amazonaws.com/$FOLDER"
  },
  {
    "id": "10002",
    "name": "default_backup_volume",
    "type": "s3",
    "owner": [
      500,
      500
    ],
    "url": "https://my_bucket.s3.eu-west-1.amazonaws.com/backup"
  },
  {
    "id": "10001",
    "name": "default_logrotation_volume",
    "type": "s3",
    "owner": [
      0,
      0
    ],
    "url": "https://my_bucket.s3.eu-west-1.amazonaws.com/logs"
  }
]

Procedure - ConfD

The following examples use ConfD through the command-line tool confd_client, which is available on all database nodes. You can also access ConfD through XML-RPC in your own Python programs. For more information, see ConfD.

Placeholder values are indicated with UPPERCASE characters. Replace the placeholders with your own values.

  1. Connect to EXAClusterOS (COS) on the cluster using c4 connect -t <DEPLOYMENT>[.<NODE>]/cos. For example:

    ./c4 connect -t 1.11/cos

    If you do not specify a node, c4 will connect to the first active node in the deployment. If the cluster is configured with an access node, the first node is the access node (usually n10).

    For more information about how to use c4 connect, see How to use c4.

  2. To find the name of the database, use the ConfD job db_list. For example:

    confd_client db_list
    - MY_DATABASE
  3. The remote archive volume must be created with the same owner as the database that will write backups to the volume. To find the owner, use the ConfD job db_info. For example:

    confd_client db_info db_name: MY_DATABASE | grep owner -A 2
    owner:
    - 500
    - 500
  4. To create a remote archive volume, use the ConfD job remote_volume_add with the following parameters:

    Parameter name Data type Description
    vol_type string

    The type of remote volume.

    The following are valid volume types: smb, ftp, sftp, webhdfs, webdav, file, s3, gs, azure

    url string The URL of the remote volume.
    remote_volume_id string, integer

    The ID of the remote volume.

    The ID must be over 10000 (5 digits). If you do not enter an ID, one will be created.

    remote_volume_name string

    The name of the remote volume.

    If you do not enter a name, one will be created. Generated remote volume names start at r0001.

    username string The username on the remote service.
    password string The password for the user on the remote service.
    owner tuple, list The database owner as an integer tuple (user id, group id) or list (user name, user group name).

    For example, to create the volume RemoteArchiveVolume1 in the S3 bucket my_bucket:

    confd_client remote_volume_add url: https://my_bucket.s3.eu-west-1.amazonaws.com vol_type: s3 remote_volume_name: RemoteArchiveVolume1 username: backup_user password: 123456789 owner: [500,500]

    For other storage platforms, use the same command but with the appropriate url and vol_type. For example:

    The ports in the URL examples are the default ports. Most protocols allow you to set the port.

  5. To verify that the volume was created, use the ConfD job remote_volume_list. For example:

    confd_client remote_volume_list
    - default_logrotation_volume
    - default_backup_volume
    - RemoteArchiveVolume1
  6. To check the properties of the volume, use the ConfD job remote_volume_info and insert the name of the volume.

    For example, if the volume was created in an S3 bucket:

    confd_client remote_volume_info remote_volume_name: RemoteArchiveVolume1
    name: RemoteArchiveVolume1
    owner:
    - 500
    - 500
    password: 123456789
    type: s3
    url: https://my_bucket.s3.eu-west-1.amazonaws.com
    username: backup_user
    vid: '10001'