Synchronous Dual Data Center (SDDC) in Detail

 

Synchronous Dual Data Center (SDDC) in detail

Site 1 works as Primary Site with two nodes hosting instances that operate on local Master Segments while their Slave Segments are being synchronously mirrored across the private network to Site 2. A dedicated 10 Gb Ethernet network is recommended to avoid a performance impact caused by that mirroring. The License Server runs on Site 1 while this site is the Primary Site. That way Site 1 will have quorum (A majority of nodes) in case of a network outage interrupting the connection between the two sites.

Site Failure

In case of losing Site 1 (for example, because of a fire), failover can be done to Site 2.

The License server (usually a virtual machine in this setup) is started on Site 2 now together with database instances that operate on their local segments. This failover causes zero data loss and a database downtime of less than a minute typically.

Node Failure Activates the Reserve Node

In case of a single node failure on Site 1 this causes a restart of the database in the same way as with an ordinary Exasol Cluster. The former Reserve node is then activated and immediately operates on the mirrored segment located on Site 2.

Node failure is Permanent: Re-creation of Master Segment

If the failed node (n12 in this example) doesn’t become available again within the Restore Delay threshold (10 Minutes by default), the Master Segment is re-created on Site 1 by copying it from the mirror on Site 2.

Node Failure is Transient: Segments on Re-activated Node are Stale

If the failed node comes back within Restore Delay, this time-consuming copying over the storage network is not necessary. But the Master Segment (B) is stale because modifications have been done on B’ in the meantime.

Node Failure is Transient: Status After Fast Mirror Resync

After B has been re-synchronized, the instance on node n13 continues to operate on that segment located on node n12 until the database is manually restarted.

 

A restart of the database is not done automatically in this scenario to avoid the database downtime involved with it. It would lead to the initial state described in the first Picture above.