DRS Invocation Not Completed

So came across an issue with my VMware cluster today where vCenter was overloading one of my hosts…..

Had a quick look at the cluster and under the summary tab it displayed the following error:

Image

Hmmm……. upon further inspection it turns out that the hosts weren’t displaying their utilisation properly – both CPU and memory displayed 0% utilisation:

Image

Turns out that DRS may have got its knickers in a twist and it wasn’t able to load balance across the cluster, possibly because it could not contact the other two hosts to determine the available resources…. as you can see from the picture above, everything ended up on my second ESXi host!

A quick google shows that I’m not alone in experiencing this issue…. but there didn’t seem to be any reference to VMware KBs or an official line from VMware regarding this issue.

Turns out most people just ‘disconnected’ and ‘reconnected’ the offending host and it fixed the issue! I did just that and it cleared the DRS configuration issue and started to display the host resource utilisation again….. and after a while DRS kicked in and re-balanced my cluster!

Image

 

I can only assume something caused the DRS or HA configuration of the cluster to go a bit funny – how or what caused it, I’m not sure……

 

Disconnecting and reconnecting an ESXi host is non-disruptive and doesn’t turn off VMs, all it does is remove the HA agent from the host and un-protect the VMs, and then re-enable the HA agent and re-protect the VMs.

I’m sure VMware are aware of this issue, but given it hasn’t been addressed in the latest release of vCenter Server (5.1.1b Aug 1st 2013), I can only assume they are none the nearer of discovering what’s causing the issue!

Advertisements

Retaining Resource Pools using Webclient

So a few days ago I posted up an article about what happens if you disable DRS in a vCD environment…..

Well I stumbled across this article in VMware’s knowledgebase:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2032893

“Great”…… I thought….. “I can use the snapshot functionality within Webclient to capture the resource pools!”……the only problem is, upon deeper digging I stumbled across a blog entry by Frank Denneman:
http://frankdenneman.nl/2013/04/08/saving-a-resource-pool-structure-web-client-feature-not-suitable-for-vcd-environments/

Turns out that VMware knowledgebase article is great for standard vSphere environments, but it won’t work with vCloud environments….. >_<”
And this is all down to the old MoRef IDs I mentioned in my previous article about using SRM to protect your vCloud!
https://thevirtualunknown.wordpress.com/2013/07/02/protecting-your-cloud-vcloud-srm/

Unfortunately it seems that the ‘RP snapshot’ feature just captures the old tree structure of your resource pools and rebuilds a new tree structure, it doesn’t capture the old MoRef IDs which are so important as they are used to correlate objects between vCD and the underlying vSphere/vCenter layer… change the MoRef IDs and vCD won’t recognise the object as it won’t exist in the vCD DB.

Man…. VMware really need to sort out this MoRef issue! ;oP

DO NOT disable DRS within a vCloud environment!!

I remember watching this video when going through my vCloud Director training, and stumbled across it again a few days ago… thought I’d share it with you all!

http://www.youtube.com/watch?v=M0_tLVR6uUc

Basically by disabling DRS on your vCloud resource cluster you remove all resource pools in vCenter (unfortunate side effect of DRS).
Now vCloud Director relies heavily on these resource pools, in fact by disabling DRS you pretty much destroy your vCloud environment!! O_o”

When you create your vCloud, you usually create a Provider virtual DataCenter (PvDC) which is usually assigned to a HA cluster within vCenter. When you start to create Organisations and then the relevant resources you wish to assign to that Organisation, you create Organisation virtual DataCenters (Org vDCs – which basically is a pot of resources you’ve carved off the PvDC).
It’s these Org vDCs which are backed up by resource pools within vCenter, hence why if you disable DRS, you pretty much destroy all your Org vDCs within your cloud!!

There really isn’t an easy way around this (restoring a backup of your vCenter Server DB will go some way to repairing your vCloud)…..

As the video shows, whilst the VMs within a vApp will keep running if powered on (or warm booted), if you power them off then they die (because the resource pool it belonged to was destroyed)!
Plus you won’t be able to manage the vApp or doing anything within vCloud Director (like deploy a catalog template, power on another vApp, etc).

There’s a more indepth article by Chris Colotti that goes into what happens when you disable DRS:
http://www.chriscolotti.us/vmware/gotcha-disabling-vmware-drs-with-vcloud-director/