vCenter HA: Known Issues in vSphere 6.7

There are two known issues regarding vCenter HA in vSphere 6.7. VMware Engineering is aware is actively working to address these in the future.

The first issue affects some (non-critical) service startup after failover and the second issue affects patching a vCenter HA enabled system to 6.7 U2.

Currently, the first issue requires you to remove vCenter HA, perform a workaround, and re-enabled vCenter HA so my advice to you would be to remove vCenter HA before you begin your update to 6.7 U2 since you’re going to have to remove it post-update anyways.

If you’re already on 6.7 U2 with vCenter HA enabled, you’ll also need to remove vCenter HA to resolve the first issue.

Issue 1: Services vmware-certificatemanagement and vmware-topologysvc do not start automatically after a failover or node restart

Two new services were introduced in vCenter Server 6.7 U2; vmware-certificatemanagement and vmware-topologysvc.

I’m not going to dive into what these services are, their names are mostly self explanatory, but all you need to be aware of here is that nether service is critical to the daily functionality of vCenter Server so if you’re running vCenter HA on 6.7 U2 today you’ve probably not noticed that these services may not be running.

root@vcsa-01 [ ~ ]# service-control --status
Stopped:
 vmcam vmware-certificatemanagement vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-rbd-watchdog vmware-topologysvc vmware-vcha vsan-dps
Running:
 applmgmt lwsmd pschealth vmafdd vmcad vmdird vmdnsd vmonapi vmware-analytics vmware-cis-license vmware-cm vmware-content-library vmware-eam vmware-perfcharts vmware-pod vmware-postgres-archiver vmware-rhttpproxy vmware-sca vmware-sps vmware-statsmonitor vmware-sts-idmd vmware-stsd vmware-updatemgr vmware-vapi-endpoint vmware-vmon vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui

The reason these two services may not be running is because vCenter HA uses something called a VMON Profile to know what node needs to be running what services.

You can find these profiles in the /etc/vmware/vmware-vmon/profiles.json file on the vCenter Server Appliance

(warning) Important: Do not attempt to create any custom profiles or do any other modifications other than what is explained in this post.

root@vcsa-01 [ ~ ]# cat /etc/vmware/vmware-vmon/profiles.json

{

   "NONE" : [],

   "HACore" : [ "vcha", "statsmonitor" ],

   "HAActive" : [ "rhttpproxy", "sca", "applmgmt", "cis-license", "cm", "content-library", "vpxd-svcs", "eam", "imagebuilder", "mbcs", "netdumper", "perfcharts", "rbd", "sps", "vapi-endpoint", "updatemgr", "vpxd", "vsan-health", "vsm", "vsphere-client", "vmonapi", "vmware-postgres-archiver", "vmcam", "pschealth", "vsphere-ui", "vsan-dps", "analytics"],

   "CRITICAL" : ["vpxd", "rhttpproxy", "rbd", "vpxd-svcs", "sps", "vmware-postgres-archiver", "vmware-vpostgres", "vapi-endpoint", "pschealth"]

}

For the current Active vCenter HA node, the HAActive profile is used and you can see from the example that the certificatemanagement and topologysvc services are not present. You might have already guessed, the workaround is going to be manually adding the certificatemanagement and topologysvc services to the HAActive profile.

  1. Backup the current profiles.json file
  2. Fire up VI and edit the profiles.json file
  3. Add entries for “certificatemanagement” and “topologysvc” services to the end of the HAActive line so it reads as follows:

(warning) Important: The service names for the profiles.json file need to be “certificatemanagement” and “topologysvc” and not the names “vmware-certificatemanagement” and “vmware-topologysvc” seen in the service-control output

   "HAActive" : [ "rhttpproxy", "sca", "applmgmt", "cis-license", "cm", "content-library", "vpxd-svcs", "eam", "imagebuilder", "mbcs", "netdumper", "perfcharts", "rbd", "sps", "vapi-endpoint", "updatemgr", "vpxd", "vsan-health", "vsm", "vsphere-client", "vmonapi", "vmware-postgres-archiver", "vmcam", "pschealth", "vsphere-ui", "vsan-dps", "analytics", "certificatemanagement", "topologysvc"],
  1. Save the profiles.json file
  2. Enable vCenter HA (See vCenter HA using the vSphere Client)

Issue 2: Patching a vCenter HA enabled environment from 6.7 (GA-U1) to 6.7 U2 can fail.

Since you’re going to need to remove vCenter HA to resolve issue 1 above, just do that and patch your vCenter Server Appliance as a single node and stop reading further.

However, if you’re interested in the details of the issue, and how to recover from it if you have hit it then keep reading.

The current documented order of operations as per, Patch a vCenter High Availability Environment, is:

  1. Set vCenter HA to Maintenance Mode
  2. Patch the current Witness node
  3. Patch the current Passive node
  4. Perform a failover
  5. Patch the current Passive node (former active)
  6. Set vCenter HA to Enabled

The issue here manifests after step 3 and you’re likely to see the following error in the vSphere Client and be unable to proceed with performing a failover to continue the patching process.

image001

If you have hit this state there is a method to recover:

  1. Shutdown the current Active (unpatched) node.
  2. Run the command vcha-reset-primary on the Passive (patched) node to force it online and become Active
  3. You should be able to log into the vSphere Client and find the vCenter HA usable
  4. Power back on the former Active (unpatched) node and confirm it joins the vCenter HA cluster as Passive
  5. Patched the last node (former Active, now current Passive)
  6. Initiate a failover from the vSphere Client (optional to set back to the same Active node you were using before patching)
  7. Set vCenter HA mode to Enabled

As we said above, if you have not yet begun patching your vCenter HA, simply remove vCenter HA and patch as normal. Perform the workaround for Issue 1 above and then re-enable vCenter HA (See vCenter HA using the vSphere Client)