Create Remote Archive Volume
This article explains how to create a remote archive volume for database backups.
Backups are stored on archive volumes in a compressed format. With on-premises installations of Exasol you can create archive volumes either locally within the cluster or in a location outside of the cluster, which in Exasol is referred to as a remote volume. For cloud deployments, only remote archive volumes are supported.
You can create a remote archive volume on cloud platforms such as Amazon S3, Azure Blob Storage, or Google Cloud Storage, or using other remote storage services such as an FTP server or a Samba file server. The basic procedure when creating a remote volume is essentially the same regardless of which platform is used.
This procedure can be carried out using the Administration API or ConfD.
Prerequisites
General prerequisites
- All nodes must be able to reach the remote target.
- The user must have read-write access on the remote host.
Amazon S3 prerequisites
-
The URL to an existing Amazon S3 bucket in the following format:
http[s]://<bucketname>.s3[-<region>].amazonaws.com/[<optional-directory>/]
If you do not have a bucket, see How Do I Create an S3 Bucket?.
A fully qualified S3 URL in the format
<bucket-name>.s3.<region-code>.amazonaws.com
will become available immediately when you have created the bucket. A URL in the legacy global endpoint format<bucket-name>.s3.amazonaws.com
may need up to 24 hours to become available. - Read-write access to the S3 bucket. If the nodes are on a private network, make sure that an S3 endpoint is configured for your VPC and that the route table for your subnet is updated accordingly to store backups in the bucket. For more information, see Endpoints for Amazon S3.
- A secret access key for the S3 bucket. If you do not have a key, see Managing Access Keys for IAM Users.
Azure Blob Storage prerequisites
-
The URL to an existing Azure Blob Storage container in the following format:
http[s]://storage_container_name.blob.corewindows.net/container_name
If you do not have a container, see Create a storage account and Create a container.
-
Read-write access to the Blob storage. If the nodes are on a private network, make sure that a service endpoint for Microsoft Storage is configured in your VNet to store backups in the container. For more information, see Virtual Network service endpoints.
-
An access key for the Blob storage account. For more information, see Authorize with shared key.
Google Cloud Storage prerequisites
-
The URL to a Google Cloud Storage bucket in the following format:
http[s]://<bucketname>.storage.googleapis.com
If you do not have a bucket, see Creating storage buckets.
-
An access key for your Cloud Storage account. For more information, see Cloud Storage authentication.
-
If the nodes are on a private network, make sure that Private Google Access is enabled in your subnet to store backups in the storage. For more information, see Configuring Private Google Access.
Other remote archive volume options
For other storage platforms you need to set up authentication and configuration options as required by the respective host. The following protocols and volume types are supported:
Service | Volume type | URL example |
---|---|---|
FTP/FTPS | ftp |
|
Samba | smb | smb:////<smbserver>:2139/optional-directory/
|
Apache Hadoop | webhdfs |
|
WebDAV | webdav | http[s]://<webdav-server>/optional-directory/
|
File | file | ./directory/
|
The ports in the URL examples are the default ports. Most protocols allow you to set the port.
Procedure - Administration API
The following examples use curl in a Linux terminal to send REST calls to endpoints in the Administration API. You can also use other interfaces and languages to interact with the API. For more information, see Administration API.
Placeholder values are styled as Bash variables, for example: $NODE_IP
. Replace the placeholders with your own values.
The option --insecure
or -k
tells curl to bypass the TLS certificate check. This option allows you to connect to a HTTPS server that does not have a valid certificate. Only use this option if certificate verification is not possible and you trust the server.
To create a remote archive volume, send a PUT
request to the /api/v1/databases/$DB_NAME/volumes
endpoint.
Specify the url to the bucket (url
), the volume name (volumeName
), and the volume type (volumeType
) as a part of the request.
For example, to create the remote volume RemoteArchiveVolume1
in the S3 bucket my_bucket
:
curl --insecure -X 'POST' \
'https://$EXASOL_IP:4444/api/v1/databases/$DB_NAME/volumes' \
-H 'accept: application/json' \
-H 'Authorization: Basic $TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://my_bucket.s3.eu-west-1.amazonaws.com",
"volumeName": "RemoteArchiveVolume1",
"volumeType": "s3"
"username": "$BUCKET_USER"
"password": "$BUCKET_PASSWORD"
}'
For other storage options, use the appropriate url
and volumeType
. For example:
curl --insecure -X 'POST' \
'https://$EXASOL_IP:4444/api/v1/databases/$DB_NAME/volumes' \
-H 'accept: application/json' \
-H 'Authorization: Basic $TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://$STORAGE_CONTAINER.blob.corewindows.net/container_name",
"volumeName": "RemoteArchiveVolume1",
"volumeType": "azure"
"username": "$CONTAINER_USER"
"password": "$CONTAINER_PASSWORD"
}'
curl --insecure -X 'POST' \
'https://$EXASOL_IP:4444/api/v1/databases/$DB_NAME/volumes' \
-H 'accept: application/json' \
-H 'Authorization: Basic $TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://$BUCKET_NAME.storage.googleapis.com",
"volumeName": "RemoteArchiveVolume1",
"volumeType": "gs"
"username": "$BUCKET_USER"
"password": "$BUCKET_PASSWORD"
}'
curl --insecure -X 'POST' \
'https://$EXASOL_IP:4444/api/v1/databases/$DB_NAME/volumes' \
-H 'accept: application/json' \
-H 'Authorization: Basic $TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"url": "ftps://$FTP_SERVER:2021/$OPTIONAL_DIRECTORY/",
"volumeName": "RemoteArchiveVolume1",
"volumeType": "ftp"
"username": "$FTP_USER"
"password": "$FTP_PASSWORD"
}'
curl --insecure -X 'POST' \
'https://$EXASOL_IP:4444/api/v1/databases/$DB_NAME/volumes' \
-H 'accept: application/json' \
-H 'Authorization: Basic $TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"url": "smb:////$SAMBA_SERVER:2139/$OPTIONAL_DIRECTORY/",
"volumeName": "RemoteArchiveVolume1",
"volumeType": "smb"
"username": "$SAMBA_USER"
"password": "$SAMBA_PASSWORD"
}'
curl --insecure -X 'POST' \
'https://$EXASOL_IP:4444/api/v1/databases/$DB_NAME/volumes' \
-H 'accept: application/json' \
-H 'Authorization: Basic $TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"url": "http://$HADOOP_SERVER:2080/my_backup_directory/",
"volumeName": "RemoteArchiveVolume1",
"volumeType": "webhdfs"
"username": "$HADOOP_USER"
"password": "$HADOOP_PASSWORD"
}'
The ports in the URL examples are the default ports. Most protocols allow you to set the port.
Verification
To verify that the new remote archive volume was created, send a GET
request to the /api/v1/databases/$DB_NAME/volumes
endpoint. For example:
curl --insecure -X 'GET' \
'https://$EXASOL_IP:4444/api/v1/databases/MY_DATABASE/volumes' \
-H 'accept: application/json' \
-H 'Authorization: Basic $TOKEN'
The output should be similar to the following (in this example, using an S3 bucket):
[
{
"id": "10003",
"name": "RemoteArchiveVolume1",
"type": "s3",
"owner": [
500,
500
],
"url": "https://my_bucket.s3.eu-west-1.amazonaws.com/$FOLDER"
},
{
"id": "10002",
"name": "default_backup_volume",
"type": "s3",
"owner": [
500,
500
],
"url": "https://my_bucket.s3.eu-west-1.amazonaws.com/backup"
},
{
"id": "10001",
"name": "default_logrotation_volume",
"type": "s3",
"owner": [
0,
0
],
"url": "https://my_bucket.s3.eu-west-1.amazonaws.com/logs"
}
]
Procedure - ConfD
The following examples use ConfD through the command-line tool confd_client, which is available on all database nodes. For more information, see ConfD.
Placeholder values are indicated with UPPERCASE characters. Replace the placeholders with your own values.
-
Connect to EXAClusterOS (COS) on the cluster using
c4 connect -t <DEPLOYMENT>[.<NODE>]/cos
. For example:If you do not specify a node, c4 will connect to the first active node in the deployment.
For more information about how to use
c4 connect
, see How to use c4. -
To find the name of the database, use the ConfD job db_list. For example:
-
The remote archive volume must be created with the same owner as the database that will write backups to the volume. To find the owner, use the ConfD job db_info. For example:
-
To create a remote archive volume, use the ConfD job remote_volume_add with the following parameters:
Parameter name Data type Description vol_type
string The type of remote volume.
The following are valid volume types:
smb
,ftp
,sftp
,webhdfs
,webdav
,file
,s3
,gs
,azure
url
string The URL of the remote volume. remote_volume_id
string, integer The ID of the remote volume.
The ID must be over 10000 (5 digits). If you do not enter an ID, one will be created.
remote_volume_name
string The name of the remote volume.
If you do not enter a name, one will be created. Generated remote volume names start at
r0001
.username
string The username on the remote service. password
string The password for the user on the remote service. owner
tuple, list The database owner as an integer tuple (user id, group id) or list (user name, user group name). For example, to create the volume RemoteArchiveVolume1 in the S3 bucket my_bucket:
confd_client remote_volume_add url: https://my_bucket.s3.eu-west-1.amazonaws.com vol_type: s3 remote_volume_name: RemoteArchiveVolume1 username: backup_user password: 123456789 owner: [500,500]
For other storage platforms, use the same command but with the appropriate
url
andvol_type
. For example:The ports in the URL examples are the default ports. Most protocols allow you to set the port.
-
To verify that the volume was created, use the ConfD job remote_volume_list. For example:
-
To check the properties of the volume, use the ConfD job remote_volume_info and insert the name of the volume.
For example, if the volume was created in an S3 bucket: