E.5. Common Behaviors: 2-4 Member Cluster with IP-based Tie-Breaker

Network Partition

Common Causes: Network switch problem.

Test Case: Connect half of the members to switch A. Connect other half to switch B. Connect switch A to switch B using up-link or crossover cable. Connect device acting as tie-breaker IP address to switch A. Start cluster services. Unplug switch A from switch B.

Expected Behavior: All cluster partitions which are comprised of exactly half (1/2 or 2/4) members send ping packets to the tie-breaker IP address. If a reply is received, the partition forms a quorum. In the test case, this means the half connected to switch A will form a quorum. Because the Cluster Manager requires a fully connected subnet, the case where an even-split (or split-brain) occurs when both partitions can reach the tie-breaker IP is not handled.

Verification: Run clustat on the members plugged in to switch A. There should be a Cluster Quorum Incarnation number listed at the top of the output.

Loss of access to shared media

Common Causes: Shared media loses power; cable connecting a member to the shared media is disconnected.

Test Case: Unplug SCSI or Fibre Channel cable from a member.

Expected Behavior: Configured action is taken to address loss of access to shared storage (reboot/halt/stop/ignore). Default is reboot.

System hang or crash (panic) on cluster member

Test Case: Kill the cluquorumd and clumembd daemons.

killall -STOP cluquorumd clumembd

Expected Behavior: The cluster member is fenced by another member. Services fail over. If a watchdog is in use, it may be triggered.