This post should hopefully help you understand what is and what is not protected by VCHA and also what to expect depending on the situation.
For more on VCHA please see my previous posts:
What does cause a VCHA Failover?
Active Node failure
If the current Active node goes down the Passive node will assume the role of Active and provide service in a few minutes. In my own personal testing I have seen API access restored in less than 3 minutes, with the vSphere Web Client restored in less than 5 minutes.
What do I mean by “down”?
- If the vCenter Server services crash within the GuestOS, that’s a failover.
- If the Active node crashes, as in the entire VM crashes, that’s a failover.
- If the underlying ESXi Host crashes/PSOD’d then the Active node will die, that’s a failover.
- If the Active node can no longer talk to the Passive and Witness nodes via the VCHA Network, that’s a failover.
What doesn’t cause a VCHA Failover?
Loss of Management (Public) Network
If the current Active Node loses connectivity on it’s Management (or Public) interface but still has connectivity to the other VCHA nodes via the VCHA Network (or Private) interface an automatic failover will not take place.
vSphere Web Client service failure
Whilst the vSphere Web Client might be your primary method to interact with vCenter Server, you might have other products (Horizon View, vRA, vCD, etc) that interact with vCenter Server via API. If the vSphere Web Client service fails, and only that service fails, then that will not trigger an automatic failover. One reason is that the vSphere Web Client might have suffered a transient issue, crashed, and will simply restart in a minute or two. Why failover the entire vCenter Server and cause a (small) outage to your Horizon View, vRA, vCD etc environments? So that is why a failover won’t occur if just the vSphere Web Client service fails.
External PSC Failure
If you are using a vCenter Server Appliance with Embedded PSC, then your PSC services are also protected by VCHA. However, if you are using an external PSC you really should also be protecting that using PSC HA via a Load Balancer (more to come on that). No point in providing HA to half of the vCenter Server stack and not to the other half right?
If your external PSC fails, and the PSC is not protected by PSC HA, the vCenter Server will not immediately trigger a failover. It should eventually cause a failover once the vCenter Server services begin to fail as a result of the PSC being down, but the Passive VCHA node won’t be able to come up if that same PSC is still down during failover.
Redeploy a failed node
If the Active node suffers a failure similar to what I described above, it will typically come back and rejoin the VCHA Cluster as the current Passive Node. However, if the Active node never comes back or can’t come back (maybe the VM is corrupt or deleted) then we have the ability to “redeploy” a node. This is true for any failed node, active, passive or witness.
It’s important to remember that to be able to provide working service of the vCenter Server services, we need to have a minimum of two nodes online out of the three that form a VCHA Cluster. You cannot tolerate failure of more than one node.
The redeploy button can be found if you select the “down” node from the vCenter HA settings view. Clicking this will remember the settings of the “down” node and quickly and easily clone a replacement node.
What if <insert node> fails?
What if the current Active Fails?
If the current Active fails, the failover will take place and the Passive will take over ownership of the Active role. The VCHA Cluster will be reported in a degraded state until the “down” node rejoins the VCHA Cluster or is redeployed.
What if the current Passive Fails?
If the current Passive fails, then no failover will take place and there will be no loss in vCenter Server service availability. The VCHA Cluster will be reported in a degraded state until the “down” node rejoins the VCHA Cluster or is redeployed.
What if the Witness Fails?
If the Witness fails, then no failover will take place and there will be no loss in vCenter Server service availability. The VCHA Cluster will be reported in a degraded state until the “down” node rejoins the VCHA Cluster or is redeployed.