One of the most anticipated features of vSphere 6.5 is native high availability for vCenter Server. No additional product like vCenter Server Heartbeat or no third party feature like MSCS/WFCS is needed.
The first thing to point out is that this feature is exclusive for the vCenter Server Appliance 6.5.
Windows vCenter Server 6.5 will still support MSCS/WFCS to provide high availability.
Overview
The high-level design of VCHA is that we have a three node cluster. An “active” node and a “passive” VM – the “passive” being a full clone of the original “active” and lastly a “witness” VM which is a stripped down clone of the original “active”.
The Witness is not capable of ever taking over the role of active and becoming a usable vCenter Server instance.
A minimum of two nodes must be online. You cannot suffer the loss of more than one node in the VCHA cluster and expect to continue to have service.
File-level replication constantly takes place between the current Active and Passive. This is done using Linux RSYNC and keeps all the configurations and service state in sync.
Native vPostgres DB replication handles the VCDB and VUMDB replication.
If the current Active node suffers a failure then the current Passive node will take over the role and become the new Active node. There is no change in FQDN.
If a failover occurs, replication will reverse as long as the former active, now passive, comes back and re-joins the VCHA Cluster.
Replication should always be flowing from the current active to the current passive.
Deployment
The VCHA Team has done some tremendous work to really simplify and streamline the deployment of VCHA.
We have two deployment options. Basic and Advanced.
I’m not a fan of those terms, but think of Basic = Automated and Advanced = Manual.
Both methods give the exact same end result.
Basic is the preferred method. It automates the entire process for you. All you need to do is provide a few IP Address, and the compute and storage location for the passive and witness nodes. It takes care of everything else. It will also automatically create a DRS Anti-Affinity rule to keep the three VMs separate.
Basic can be done if the VCSA VM is located in the same SSO Domain as vCenter Server itself.
The following video is a quick demo of a Basic Deployment. As you can see, the VCSA VM resides within the vCenter Inventory.
If the VCSA is a Virtual Machine managed by a completely separate vCenter Server (i.e. a separate management VC) then you would need to use the Advanced method. The reason for this is, if the VC you are enabling VCHA on cannot locate itself as a VM, it cannot automate the clones of itself in this release and must rely on the user to manually perform the clone operations.
When doing the Advanced option you must:
- Manually add a second NIC on the VCSA VM attached to your VCHA Network Portgroup.
- Via the VAMI (https://IP_FQDN:5480) configure a static IP address for the internal VCHA Network.
- Launch the VCHA Configuration Wizard and choose Advanced.
- Provide the IP Addresses you want to assign to the Peer and Witness node.
- Clone the VCSA VM Twice, ensuring to also use Guest Customization to provide the unique IP Addresses you entered previously to the second NIC of each clone.
- Power on each node and complete the Wizard.
The following is a quick demo of an Advanced deployment. As you can see the VCSA named vcsa-02 does not exist as a VM in it’s own Inventory. We will manually clone this VM from it’s Managing VC.
VMware is targeting less than 5 minutes RTO but unfortunately my labs aren’t that beefy – it’s running on nested ESXi so I couldn’t demonstrate real-world failover times.
LikeLike
Do all nodes have to be on the same subnet?
LikeLike
No there is an option to over-ride the “Public” IP to be different and the internal VCHA IPs can also be on different subnets as long as latency is less than 10ms.
I’ll hopefully do a post about the Active and Passive having different public IPs soon.
LikeLike
Looking to implement cross site DR HA, so each vcsa resides in a different datacentre with different IP. Not sure how DNS would work though, might need a manual change in the event of a failure.
LikeLike
VCHA is designed for HA and not DR. But you can have the two main nodes reside in different Datacenters with different IPs. You would need to ensure your DNS is updated – how that’s done would be outside of the vSphere jurisdiction.
LikeLike
Thanks for the article. Just my 5 cents: BASIC mode can also be used if the vCenter is running under another vCenter given they share the same SSO domain.
LikeLike
Yup – which is why I have the statement “Basic can be done if the VCSA VM is located in the same SSO Domain as vCenter Server itself.” in the post 🙂
LikeLike
So, following your info and reading through the crappy vmware documentation provides the same end. Error saying ‘This operation is not allowed in the current state” Failed to get management network information.
All networking is correct, ‘management’ network is reachable, appliance is functioning. Not quite understanding why this is happening. Does this in Basic or Advanced setup.
Also, in your advanced video you show the profile, VCHA-Advanced, being setup with the management address of the existing vcenter system, this is what is put on the clone and after the clone comes online, since it has the same management address as its originator, it causes disruption in the original appliance. Have to destroy the clone and reboot the existing vcenter system.
This typical these days for vmware to roll out a product that is sub-par.
I must be missing something??
LikeLike
Hi John,
Re: “This operation is not allowed in the current state”
Can you make sure the VAMI service is running buy running a “service vami-lighttpd restart“.
Check it’s status by running “service vami-lighttpd status”
For VCHA Basic or Advanced, the Management Address of all nodes (eth0/NIC1) is expected to be the same (But can be overridden if placing the nodes on different network segments) Only the VCHA address (eth1/NIC2) is expected to be unique. When you perform the clones, eth0 on the two clones should not be online. The interface should be marked down within the Photon OS.
It’s also important that you do not perform the clones until AFTER you have stepped through the first few bits of VCHA Advanced Deployment Wizard but BEFORE hitting Finish.
Hope this helps.
LikeLike
Thanks, I will try this again after my new vCenter Appliance is ready, I had to destroy the former as it crashed after trying this 3 times. Also, I went back over the video and didn’t see anywhere stating to clone but not power on so to set the nic0 on each clone to offline, I must have missed that part.
In regards to “The interface should be marked down within the Photon OS.” Are you referring to setting this offline in vCenter? or is there a cli step in the console of vCenter at the ‘Photon’ level?
Thanks again,
LikeLike
In the video I did for Adv – from the 1 minute mark to 1:50 we are preparing VCHA using the UI. That asks for the VCHA IP you want to define the Passive and Witness. What that workflow also does is prepares the VCSA for VCHA. Once you have passed that preparation (and again before hitting finish) any clones you create from the VCSA should power on with eth0 automatically down within the Photon OS. There is no need to disconnect the nic from the edit settings or anything like that. After your first clone – open a console and run ifconfig. You should only see eth1 online. You should not see eth0 online. If you do – then something unexpected is happening and I’d ask that you open an SR with VMware Support if possible.
There should also be a prepare-vcha.log in /var/log/vmware/vcha where you should see the line “Completed updating /etc/systemd/network/10-eth0.network to manual”. That line is telling us that we are marking eth0 as manual so the interface should not come up automatically.
Before hitting finish both clones should be powered on and both clones should have eth0 down in the Photon OS and only eth1 online with unique IPs specified in the Adv UI workflow which you also would have applied during the Guest Customization phase of the clone workflow.
LikeLike
I guess I’m not intelligent enough to build this. I started new, spun up a vcenter appliance inside my main vcenter appliance, configured all networking correctly, started the configuration on the second appliance, got all the way through to the cloning process on the first clone and after your step ‘1f’ i had the option to customize the hardware even though i unchecked this option. i hit finish and after it powered on I completed the witness clone, let it power on and hit Finish at the main configuration step…Failed. both clones powered on with the NIC 0 active and connected.
2017-02-03T21:17:48.089Z info vpxd[7FD36DF3E700] [Originator@6876 sub=vpxLro opID=VchaPropertyProvider:2405-11781-ngc-36] [VpxLRO] — FINISH lro-2570
2017-02-03T21:17:48.090Z info vpxd[7FD36DF3E700] [Originator@6876 sub=Default opID=VchaPropertyProvider:2405-11781-ngc-36] [VpxLRO] — ERROR lro-2570 — FailoverClusterManager — vim.vcha.FailoverClusterManager.getClusterHealth: vim.fault.InvalidState:
–> Result:
–> (vim.fault.InvalidState) {
–> faultCause = (vmodl.MethodFault) null,
–> faultMessage = (vmodl.LocalizableMessage) [
–> (vmodl.LocalizableMessage) {
–> key = “com.vmware.vim.vcha.error.clusterNotConfigured”,
–> arg = ,
–> message =
–> }
–> ]
–> msg = “”
–> }
–> Args:
–>
That was pretty much the end of the log.
Thanks for your input, I’ll just run a nightly backup of vcenter and if it fails just spin up another. Too much time waist on this project.
LikeLike