Fail Safety on Cloud Platform

In an event of failure of a hardware component or server, the node in a cluster becomes unavailable. Fail safety process indicates that a specific node within a cluster is no longer available and the cluster nodes replicate data to neighbor nodes if redundancy is configured.

Exasol provides the following two mechanisms for failover:

  • Hot Standby: In this mechanism, you have one or more active reserve nodes for the active nodes in your system. In case of a failure, the reserve node immediately takes over for the failed node. This is a default mechanism for all cloud offerings. Hot standby is a relatively fast failover mechanism, but it is expensive.

    Hot standby can be implemented for Exasol's offering for cloud platforms and for on-premise.

  • Cold Standby: In this mechanism, you have one or more inactive reserve nodes for the active nodes in your system. In case of a failure, the reserve node will start and take over for the failed node. Cold standby is a slower failover mechanism than hot standby, but it is more cost-effective.

    Cold standby is implemented using Exasol Cloud Failover Plugin, which comes pre-installed with all Exasol cloud offerings.

    Reserve nodes that are added by the Exasol Cloud Failover Plugin will be hot standby nodes by default. To configure the reserve node as a cold standby node, you need to shut down the node by stopping the resources from your cloud provider (AWS, Azure, or GCP). For details about how to do this, refer to the documentation for the respective cloud platform.

Hot Standby

To achieve a fast operation recovery, the cluster operating system automatically restarts the necessary services if the corresponding resources (main memory, number of nodes, etc.) are available.

Exasol 4+1 Cluster: Redundancy 2

If volumes are configured with redundancy 2 then each node holds a mirror of data that is operated on by a neighbor node. If for example, n11 modifies A the mirror A' on n12 is synchronized over the private network. Should an active node fail, the reserve node will step in starting an instance.

What Happens on Node Failure

A node failure leads to the following sequence of actions within the Exasol Cluster:

  • In approximately five seconds, EXACluster OS realizes that a node has failed and stops all affected databases on the cluster.
  • In the following two seconds (approximately), a reserve node is activated by EXACluster OS and the databases are restarted.
  • In the following eight seconds (approximately), these databases can be connected to again by end users.
  • A background restore of segments towards the new active node is done while the databases are up in the following minutes.
  • It takes only couple of seconds for the database to be available.

Above timings are given for an average-sized cluster and may not be regarded as upper limits or as precise timings. Your mileage may vary, depending on your cluster size, the number of nodes and the database load.

Exasol 4+1 Cluster: Persistent node failure

If the failed node n12 does not become available again until the threshold Restore Delay (defaults to 10 Minutes) is over, the segments that resided on that failed node (A‘ and B) are copied to the newly activated node (former reserve node n15) using the mirrors B'  on n13 and A on n11. This is a time consuming activity that puts a significant load on the private network. If the private network has been separated into database network and storage network, this copying is done through the storage network.

It is recommended to add a new reserve node in this scenario to replace the crashed node n12.

Exasol 4+1 Cluster: Transient node failure

If the failed node n12 comes back within Restore Delay, the segments on that node are now stale because their mirrors have been operated on in the meantime. They have to be re-synchronized before they can be used again. Nevertheless this scenario does not require a complete restore of the mirrors towards n15.

Fast mirror re-sync

After n12 came back, the stale segments have been re-synchronized, applying the changes on A and B‘ that have been done while n12 was offline. This activity was much faster and less load-intensive than the complete restore of these segments towards n15 that has been done on behalf of the persistent node failure. The instance on n15 works now on the Master Segment B residing on n12 until a restart of the database. That restart is not done automatically to avoid the short downtime associated with it.

Payload of database node x resides on volume master node y

This is how the above situation is seen in EXAoperation.

Database restart

If the database is restarted after a transient node failure, this re-establishes the initial scenario with n11-n14 as active nodes and n15 as reserve node. Drawback is that this causes a short period of downtime for the database. Another option is the Move Node operation:

Move node

Instead of restarting the database, alternatively a Move node operation can be done without causing a downtime. In this case the segments residing on n12 are being copied over the private network to n15, which can be time-consuming depending on the affected data volume. Also the private network can become significantly utilized during that period.

Select affected volume & node

In order to move the volume (instead of doing a database restart), after selecting the affected volume in the EXAStorage page, the node presently not used for the volume is to be selected.

Click on the Move Node button now.

Click again on the Move Node button after selecting the target node now.

Monitor Recovery Progress

The ongoing recovery can be monitored in the volume detail page now.

Log entries

The finish of the restore can be seen in the log maintained by the logservice.

Cold Standby

The cold standby mechanism is designed specifically for deployment of Exasol on a cloud platform. This is a relatively slower but cost effective mechanism for failover. It's similar to the hot standby mechanism with a slightly different way of implementation.

You must have at least 1 reserve node to enable this feature. The reserve node should be in suspended mode.

Cold standby is only available if more than 50 % of all the active data nodes are available during an outage. This means that the smallest cluster that can support cold standby is a 3+1 configuration (3 active data nodes + 1 reserve node).

Reserve nodes that are added by the Exasol Cloud Failover Plugin will be hot standby nodes by default. To configure the reserve node as a cold standby node, you need to shut down the node by stopping the resources from your cloud provider (AWS, Azure, or GCP). For details about how to do this, refer to the documentation for the respective cloud platform.

What Happens on Node Failure

  • In approximately five seconds, EXACluster OS realizes that a node has failed and stops all affected databases on the cluster.
  • In the following five minutes (approximately), cloud failover plug-in restarts the system and tries to make the system available again.
  • If the above fails to bring the system online, The cloud failover plug-in starts the reserve node. It takes another five minutes.
  • A background restore of segments towards the new active node is done while the databases are up in the following minutes.
  • It takes only couple of minutes for the database to be available.

Above timings are given for an average-sized cluster and may not be regarded as upper limits or as precise timings. Your mileage may vary, depending on your cluster size, the number of nodes and the database load.