SDDC: Administration tasks
This article describes database administration tasks that are different in an SDDC setup.
The documentation in this section is intended for advanced users who are already familiar with how to install and administer Exasol databases using ConfD and Exasol Deployment Tool (c4).
Incorrect configuration and administration of an SDDC cluster presents a high risk of data loss. When you install and administer an SDDC solution with Exasol, observe extreme caution and follow the provided instructions precisely.
If you have any doubts when performing a task described in this documentation, contact Support for guidance.
Introduction
The procedures in this section are based on the example cluster configuration in SDDC: Installation. The examples only describe the steps that are specific to an SDDC setup. For more details about the full procedures, see the respective links to other sections in this documentation.
Start a database
It is critical that only one database is configured to run at a time. Starting both the active and passive databases at the same time will result in data corruption.
If you try to start the passive database while the active database is running, you will receive an error message that the data volume is in use. Only start a database if both databases are in the setup state.
-
Check the state of each database using the ConfD job db_state:
Copyconfd_client db_state db_name: PROD
running
...
confd_client db_state db_name: PROD_DR
setupDatabases in
running
state are up and running. Databases insetup
state are shut down. -
To start the database, use the ConfD job db_start:
Copyconfd_client db_start db_name: PROD -
The database is fully up and running when the
connectible
property isyes
.To check database connectivity, use the ConfD job db_info:
Copyconfd_client db_info db_name: <database name> | grep connectible
Change database configuration
Changes to the configuration of an existing database, such as enabling auditing or adding a parameter, are only applied to the database specified in the ConfD job db_configure. To ensure that database configuration is in sync between both databases in an SDDC setup, any changes made on one database should be applied immediately afterwards to the other database.
The following example shows how to add a new parameter to both databases in the example SDDC setup.
Invalid parameters will prevent the database from starting. To avoid unnecessary downtime, create a support case to get guidance from Support before you add or change database parameters.
To change the configuration, the database must first be stopped.
To add a parameter, use the ConfD job db_configure.
Example:
confd_client db_configure db_name: PROD params_add: '[-oidcProviderClientSecret=abcd]'
...
confd_client db_configure db_name: PROD_DR params_add: '[-oidcProviderClientSecret=abcd]'
Create new volumes
When you need to create a new volume, make sure that the following requirements are met:
-
The
redundancyparameter must be set to2. -
The
nodesparameter must include all active nodes from both data centers, but no reserve nodes. -
The
num_master_nodesparameter must be identical to the number of active database nodes.
Example:
confd_client st_volume_create name: arc_vol disk: disk1 type: archive size: '100 GiB' nodes: '[11,12,13,14,15,16,17,18,19,20,21,23,24,25,26,27,28,29,30,31,32,33]' redundancy: 2 num_master_nodes: 11
Create backups
Backups are database specific, which means that it is not possible to create a level 1 backup based on a level 0 backup from a different database, even if the database is using the same data volume. For SDDC setups, this means that the PROD_DR database is not able to create a level 1 backup until you have created a level 0 backup.
To start a backup, use the ConfD job db_backup_start.
Example:
confd_client db_backup_start db_name: PROD_DR backup_volume_name: arc_vol level: 0 expire: 1w
Activate and deactivate backup schedules
In order to avoid false alerts or error messages, we recommend that you deactivate all backup schedules for the passive database and only activate them once the database is in use.
After a swap to the passive site, the backup schedule needs to be activated for the PROD_DR database. Backups configured in the schedule will not run if the database is not running.
To activate an existing backup schedule, use the ConfD job db_backup_modify_schedule and set enabled: true.
Example:
confd_client db_backup_modify_schedule db_name: PROD_DR backup_name: "Backup PROD_DR Level 0" enabled: true
A level 0 backup must be taken before any scheduled level 1 backup can run.
To deactivate a backup schedule, use the ConfD job db_backup_modify_schedule and set enabled: false.
Example:
confd_client db_backup_modify_schedule db_name: PROD_DR backup_name: "Backup PROD_DR Level 0" enabled: false
Switch databases and shut down nodes
This procedure describes a scenario where you need to temporarily switch operation to the passive database because the nodes in the active site for some reason must be taken offline.
The order when switching operation to the secondary site is to first stop the active database on the primary site, then stop the nodes, then start the passive database in the other data center. When the nodes in DC 1 have been brought back online, the database in DC 2 is stopped and the database in DC 1 is started again.
When an offline node rejoins the cluster, all outdated storage segments will be recovered. The amount of data that needs to be recovered, and the time required for this operation, depends on the amount of data that was changed while the node was offline.
Only one database can run at a time. Before you start a database, make sure that the other database is stopped.
Switch from the active database PROD to the passive database PROD_DR and stop the nodes in DC 1
-
Stop the active database PROD using the ConfD job db_stop:
Copyconfd_client db_stop db_name: PROD -
Stop the Exasol services on all database nodes on the active site.
NOTE: This step must be carried out on all nodes on the active site.
Copysystemctl --user stop c4_cloud_command
systemctl --user stop c4 -
From a node on the passive site, check the state of the stopped nodes using the ConfD job node_state:
Copyconfd_client node_state -
From a node on the passive site, suspend all nodes on the active site using the ConfD job node_suspend:
Copyconfd_client node_suspend nid: '[11,12,13,14,15,16,17,18,19,20,21,22]' -
Start the PROD_DR database using the ConfD job db_start:
Copyconfd_client db_start db_name: PROD_DR -
To be protected against node failures in DC 2 while the nodes in DC 1 are offline, temporarily increase the redundancy from 2 to 3 for both the data volume and the archive volume. This will create additional redundancy segments on all nodes in DC 2.
Rebuilding redundancy on the new segments can be very time-consuming. Before you increase redundancy, consider if the extra downtime required for this operation is motivated by the risk of another node failure occurring.
To increase the redundancy for the volumes, use the ConfD job st_volume_increase_redundancy
Copyconfd_client st_volume_increase_redundancy vname: data_vol delta: 1
...
confd_client st_volume_increase_redundancy vname: arc_vol delta: 1You can monitor the progress using
logd_collectorcsrec:Copylogd_collect StorageCopycsrec -lCopycsrec -s –v VOLUME_ID
Start the stopped nodes in DC 1 and switch back to the PROD database
This procedure describes the steps to perform after the nodes in DC 1 are brought back online.
Nodes are automatically added to the cluster when they are started, there is no need to explicitly resume them.
-
From one of the nodes on the passive site, check the state of the nodes using the ConfD job node_state:
Copyconfd_client node_state -
Start the Exasol services on the nodes in DC 1 that were previously offline.
NOTE: This step must be carried out on all nodes where the services are not running.
Copysystemctl --user start c4
systemctl --user start c4_cloud_command -
Once the volumes are fully restored and in ONLINE state, reduce the redundancy in DC 2 from 3 to 2 to keep disk IO at a minimum.
To change the redundancy, use the ConfD job st_volume_decrease_redundancy:
Copyconfd_client st_volume_decrease_redundancy vname: data_vol delta: 1 nid: 23
...
confd_client st_volume_decrease_redundancy vname: arc_vol delta: 1 nid: 23 -
Stop the PROD_DR database using the ConfD job db_stop:
Copyconfd_client db_stop db_name: PROD_DR -
Check the state of the PROD_DR database using the ConfD job db_state. Wait until the state is shown as
setup
, which means that the database is stopped.Copyconfd_client db_state db_name: PROD_DR
...
Result:
'setup' -
Once the PROD_DR database is stopped, start the PROD database using the ConfD job db_start:
Copyconfd_client db_start db_name: PROD
Switch databases without shutting down nodes
In this scenario you switch from the active database to the passive database or vice versa. All nodes remain online.
Only one database can run at a time. Before you start a database, make sure that the other database is stopped.
-
Stop the active database PROD using the ConfD job db_stop:
Copyconfd_client db_stop db_name: PROD -
Check the state of the PROD database using the ConfD job db_state.
Wait until the state is shown as
setup
, which means that the database is stopped.Copyconfd_client db_state db_name: PROD_DR
...
Result:
'setup' -
Start the PROD_DR database using the ConfD job db_start:
Copyconfd_client db_start db_name: PROD_DR
To switch back to the PROD database, perform the same procedure in reverse.