VMware NSX 6.2.4 released

So after the huge cock-up with 6.2.3, VMware have turned around a new version of NSX in a matter of weeks to fix all the bugs!

http://blogs.vmware.com/kb/2016/08/vmware-nsx-vsphere-6-2-4-now-available.html

Of major concern was the whole HA issue that meant DLR nodes got stuck in a ‘split-brain’ mode after 24 days of operations – and every 24 days after that! It also didn’t help that the previous version was causing VMs to lose network connectivity if the pMAC of the DLR was the MAC address in the default gateway.

Anyways, hopefully all the bugs have been ironed out and the new release is more stable!

Release Notes can be found here.

For some of my customers, the release of 6.2.4 brings back the vShield Endpoint management support which is great given vCNS and vShield Manager is going end of general support on the 19th Sept!

For more info about this, read my previous blog entry: NSX 6.2.3 Released – support for vShield Endpoint Management

ESXi bug – backing up 128GB vdisks and CBT

So I read about this issue a week or so ago when this bug started doing the rounds in the VMware communities and The Register picked up on the issue…. I was planning to blog about it but it slipped my mind due to a busy end of month! >_<”

Anyways, VMware have sheepishly recognised the bug and produced a KB article about it: http://kb.vmware.com/kb/2090639

The bug affects VMs with Changed Block Tracking (CBT) turned on, specifically those VMs that have had its storage (so a single vdisk) increased in size by more than 128GB.
The problem only presents itself when it comes to the execution of the command QueryChangedDiskAreas(). This API call is commonly used by backup softwares to determine what part of a VMs vmdk file has been changed since the last backup in order to execute an Incremental Backup.

It seems that once the vmdk is increased to more than 128GB, you get an inaccurate list of allocated VM disk sectors returned by the API call, and so any sort of incremental backup could be erroneous and some changed blocks may not be captured during backup. Obviously this means that in the case of you restoring from the erroneous backup, you may experience data loss!

This is a known issue affecting VMware ESXi 4.x and ESXi 5.x and currently, there is NO resolution.

To work around this issue, VMware recommends that you disable and then re-enable CBT on the VM. The next backup after toggling CBT will be a full backup of the virtual machine.

The issue here is in order to disable CBT, you need to power off your VM and ensure there are no snapshots attached to the VM…… quite a pain in the rear end!
Info on how to disable and enable CBT can be found here: http://kb.vmware.com/kb/1031873

Also I’m not too sure whether it fixes CBT or whether it will keep generating the same inaccurate info every time the vdisk blocks change and you try to run an Incremental…. unfortunately there isn’t enough information out there yet!
I pity the admin who has to run daily fulls in order to combat this bug….. 128GB backups… ouch!

Fortunately none of my customers have a vdisk of that monstrous size so this shouldn’t affect many of them!

Intermittent NFS All Paths Down on ESXi 5.5 U1 upgrade

Those of you using NFS storage and planning to upgrade to the latest version of vSphere – 5.5 U1 – please hold off your upgrades as there is a bug within the code which is currently causing issues on paths to NFS volumes.

The bug causes the intermittent loss of connectivity, which can lead to an “All Paths Down” error to your NFS storage! During the disconnects VMs will appear frozen and the NFS datastores may be greyed out. This appears to impact all storage vendors and all environments on 5.5. U1 accessing NFS…..!!
Obviously the loss of a path will impact IOs from VMs to datastores…… and this can result in BSODs for Windows VMs and filesystems becoming read only for Linux VMs (or even kernel panics)!

The recommendation at this point is not to upgrade to vSphere 5.5 U1 and stay on vSphere 5.5 GA. If you have upgraded to 5.5 U1 then you may need to downgrade back to 5.5GA.

More information can be found here:
http://kb.vmware.com/kb/2076392

I suggest you subscribe to the KB in order to get an update as to when this bug is resolved.

 

Have a look at William Lam’s blog regarding setting up alarms within vCenter Server that could help alert when these APD issues occur:
http://www.virtuallyghetto.com/2014/04/how-to-create-vcenter-alarm-to-alert-on-esxi-5-5u1-nfs-apd-issue.html

 

Obviously the main reason for upgrading to 5.5 U1 was to patch the Heartbleed vulnerability within OpenSSL, VMware are informing customers not to upgrade but to install security patches to address the Heartbleed vulnerability…. More info on this process can be found here:
http://kb.vmware.com/kb/2076665