So came across an issue with my VMware cluster today where vCenter was overloading one of my hosts…..
Had a quick look at the cluster and under the summary tab it displayed the following error:
Hmmm……. upon further inspection it turns out that the hosts weren’t displaying their utilisation properly – both CPU and memory displayed 0% utilisation:
Turns out that DRS may have got its knickers in a twist and it wasn’t able to load balance across the cluster, possibly because it could not contact the other two hosts to determine the available resources…. as you can see from the picture above, everything ended up on my second ESXi host!
A quick google shows that I’m not alone in experiencing this issue…. but there didn’t seem to be any reference to VMware KBs or an official line from VMware regarding this issue.
Turns out most people just ‘disconnected’ and ‘reconnected’ the offending host and it fixed the issue! I did just that and it cleared the DRS configuration issue and started to display the host resource utilisation again….. and after a while DRS kicked in and re-balanced my cluster!
I can only assume something caused the DRS or HA configuration of the cluster to go a bit funny – how or what caused it, I’m not sure……
Disconnecting and reconnecting an ESXi host is non-disruptive and doesn’t turn off VMs, all it does is remove the HA agent from the host and un-protect the VMs, and then re-enable the HA agent and re-protect the VMs.
I’m sure VMware are aware of this issue, but given it hasn’t been addressed in the latest release of vCenter Server (5.1.1b Aug 1st 2013), I can only assume they are none the nearer of discovering what’s causing the issue!