Fail Safety on Cloud Platform

In an event of failure of a hardware component or server, the node in a cluster becomes unavailable. Fail safety process indicates that a specific node within a cluster is no longer available and the cluster nodes replicate data to neighbor nodes if redundancy is configured.

Exasol provides the following two mechanisms for failover:

  • Hot Standby: In this mechanism, you have active standby node(s) for the active node(s) of your system. In case of a failure, the standby node immediately takes over for the failed node. This is a default mechanism for all cloud offerings. This is a faster failover mechanism but it’s expensive.
    You can implement this mechanism for Exasol's offering for cloud platforms and for on-premise.
  • Cold Standby: In this mechanism, you have a standby node for the active node(s) of your system. To enable this option, you need to shut down the standby node from your cloud instance (AWS, Azure, or GCP). It is a relatively slower failover mechanism but it's more cost-effective. In case of a failover, the node starts and it takes over for the failed node. The mechanism is implemented using Exasol Cloud Failover Plugin which comes pre-installed with all cloud offerings.

Hot Standby

The most important objective is data integrity. The failure of a hardware component does not cause data loss or data corruption. The hard drives of the Cluster nodes are configured in RAID 1 pairs to compensate single disk failures without any interruptions. Additionally the cluster nodes replicate data to neighbor nodes if redundancy 2 volumes are used - which is a best practice.

To achieve a fast operation recovery, the cluster operating system restarts automatically the necessary services if the corresponding resources (main memory, number of nodes, …) are available.

Exasol 4+1 Cluster: Redundancy 2

If volumes are configured with redundancy 2 then each node holds a mirror of data that is operated on by a neighbor node. If for example, n11 modifies A the mirror A' on n12 is synchronized over the private network. Should an active node fail, the reserve node will step in starting an instance.

What Happens on Node Failure

A node failure leads to the following sequence of actions within the Exasol Cluster:

  • In approximately five seconds, EXACluster OS realizes that a node has failed and stops all affected databases on the cluster.
  • In the following two seconds (approximately), a reserve node is activated by EXACluster OS and the databases are restarted.
  • In the following eight seconds (approximately), these databases can be connected to again by end users.
  • A background restore of segments towards the new active node is done while the databases are up in the following minutes.
  • It takes only couple of seconds for the database to be available.

Above timings are given for an average-sized cluster and may not be regarded as upper limits or as precise timings. Your mileage may vary, depending on your cluster size, the number of nodes and the database load.

Exasol 4+1 Cluster: Persistent node failure

If the failed node n12 does not become available again until the threshold Restore Delay (defaults to 10 Minutes) is over, the segments that resided on that failed node (A‘ and B) are copied to the newly activated node (former reserve node n15) using the mirrors B'  on n13 and A on n11. This is a time consuming activity that puts a significant load on the private network. If the private network has been separated into database network and storage network, this copying is done through the storage network.

It is recommended to add a new reserve node in this scenario to replace the crashed node n12.

Exasol 4+1 Cluster: Transient node failure

If the failed node n12 comes back within Restore Delay, the segments on that node are now stale because their mirrors have been operated on in the meantime. They have to be re-synchronized before they can be used again. Nevertheless this scenario does not require a complete restore of the mirrors towards n15.

Fast mirror re-sync

After n12 came back, the stale segments have been re-synchronized, applying the changes on A and B‘ that have been done while n12 was offline. This activity was much faster and less load-intensive than the complete restore of these segments towards n15 that has been done on behalf of the persistent node failure. The instance on n15 works now on the Master Segment B residing on n12 until a restart of the database. That restart is not done automatically to avoid the short downtime associated with it.

Payload of database node x resides on volume master node y

This is how the above situation is seen in EXAoperation.

Database restart

If the database is restarted after a transient node failure, this re-establishes the initial scenario with n11-n14 as active nodes and n15 as reserve node. Drawback is that this causes a short period of downtime for the database. Another option is the Move Node operation:

Move node

Instead of restarting the database, alternatively a Move node operation can be done without causing a downtime. In this case the segments residing on n12 are being copied over the private network to n15, which can be time-consuming depending on the affected data volume. Also the private network can become significantly utilized during that period.

Select affected volume & node

In order to move the volume (instead of doing a database restart), after selecting the affected volume in the EXAStorage page, the node presently not used for the volume is to be selected.

Click on the Move Node button now.

Click again on the Move Node button after selecting the target node now.

Monitor Recovery Progress

The ongoing recovery can be monitored in the volume detail page now.

Log entries

The finish of the restore can be seen in the log maintained by the logservice.

Cold Standby

The cold standby mechanism is designed specifically for the Exasol's deployment on cloud platform. This is a relatively slower but cost effective mechanism for failover. It's similar to the hot standby mechanism with a slightly different way of implementation.

You must have standby node(s) to enable this feature. The standby node should be in suspended mode.

What Happens on Node Failure

  • In approximately five seconds, EXACluster OS realizes that a node has failed and stops all affected databases on the cluster.
  • In the following five minutes (approximately), cloud failover plug-in restarts the system and tries to make the system available again.
  • If the above fails to bring the system online, The cloud failover plug-in starts the standby node. It takes another five minutes.
  • A background restore of segments towards the new active node is done while the databases are up in the following minutes.
  • It takes only couple of minutes for the database to be available.

Above timings are given for an average-sized cluster and may not be regarded as upper limits or as precise timings. Your mileage may vary, depending on your cluster size, the number of nodes and the database load.