New VMware Fling – HTML5 Embedded Host Client for ESXi

The clever bods at VMware labs have just released another new Fling… and this one’s a damn awesome Fling!

https://labs.vmware.com/flings/esxi-embedded-host-client

What it allows you to do (after installing the VIB) is to open up a browser to your ESXi host and gain access to a simple client allowing you to manage your freshly installed ESXi host (for example if you haven’t yet deployed vCenter Server or don’t have access to the C# client). The look and feel of it is very similar to the current Web client…. very simple to navigate and access the features/functions.

ESXiHostClientFlingScreenShotLargest

The client is still in the development phase, and as such does not open up all the features possible, the VMware engineers have only implemented a hand full of the most important features:

  • VM operations (Power on, off, reset, suspend, etc).
  • Creating a new VM, from scratch or from OVF/OVA (limited OVA support)
  • Configuring NTP on a host
  • Displaying summaries, events, tasks and notifications/alerts
  • Providing a console to VMs
  • Configuring host networking
  • Configuring host advanced settings
  • Configuring host services

Only issue is it’s a VIB which means you need to copy it across to your ESXi host (using a SCP tool like WinSCP).
Then once copied over you need to install the VIB into the kernel (esxcli software vib install -v /tmp/esxui.vib)

Hopefully the next version will be a fully packaged installer that asks for the IP address of your ESXi host and goes off and deploys the VIB automatically!

The best thing about it – it’s HTML5 and JavaScript….. no Flash!!!! =)

Go off, have a try and then offer your feedback or post up any issues you encounter! Hopefully it’ll get rolled into vSphere ESXi in the future! =)

Need to know what version of VMware product relates to what build number?

So I’m always trying to correlate versions of ESXi or vCenter Server to the build numbers…. mainly when I’m running a health check and I need to determine what version of VM hardware or VMware Tools should be deployed for the version the customer is running.

Anyways, I thought I’d share the webpages that I always refer back to.

For VMware ESXi Release and Build numbers, visit: http://www.virten.net/vmware/esxi-release-build-number-history/

For vCenter Server Release and Build numbers, visit: http://www.virten.net/vmware/vcenter-release-and-build-number-history/

To determine what version of VMware Tools you should be running against the version of ESXi you have deployed, visit: http://packages.vmware.com/tools/versions

And finally, to determine what version of VM Hardware against the version of ESXi, visit: http://kb.vmware.com/kb/1003746

ESXi bug – backing up 128GB vdisks and CBT

So I read about this issue a week or so ago when this bug started doing the rounds in the VMware communities and The Register picked up on the issue…. I was planning to blog about it but it slipped my mind due to a busy end of month! >_<”

Anyways, VMware have sheepishly recognised the bug and produced a KB article about it: http://kb.vmware.com/kb/2090639

The bug affects VMs with Changed Block Tracking (CBT) turned on, specifically those VMs that have had its storage (so a single vdisk) increased in size by more than 128GB.
The problem only presents itself when it comes to the execution of the command QueryChangedDiskAreas(). This API call is commonly used by backup softwares to determine what part of a VMs vmdk file has been changed since the last backup in order to execute an Incremental Backup.

It seems that once the vmdk is increased to more than 128GB, you get an inaccurate list of allocated VM disk sectors returned by the API call, and so any sort of incremental backup could be erroneous and some changed blocks may not be captured during backup. Obviously this means that in the case of you restoring from the erroneous backup, you may experience data loss!

This is a known issue affecting VMware ESXi 4.x and ESXi 5.x and currently, there is NO resolution.

To work around this issue, VMware recommends that you disable and then re-enable CBT on the VM. The next backup after toggling CBT will be a full backup of the virtual machine.

The issue here is in order to disable CBT, you need to power off your VM and ensure there are no snapshots attached to the VM…… quite a pain in the rear end!
Info on how to disable and enable CBT can be found here: http://kb.vmware.com/kb/1031873

Also I’m not too sure whether it fixes CBT or whether it will keep generating the same inaccurate info every time the vdisk blocks change and you try to run an Incremental…. unfortunately there isn’t enough information out there yet!
I pity the admin who has to run daily fulls in order to combat this bug….. 128GB backups… ouch!

Fortunately none of my customers have a vdisk of that monstrous size so this shouldn’t affect many of them!

Shellshock Vulnerability

So last week it was reported that a serious vulnerability was discovered in Bash (Bourne-Again SHell) which is pretty much core to a lot of Linux/Unix OSes – including Apple’s MacOS. The bug, dubbed Shellshock, is supposed to be more serious than the previous OpenSSL Heartbleed vulnerability that was discovered earlier this year. It allows hackers to remotely take control of any system running Bash!

VMware have now released a KB that explains which hypervisors are affected by Shellshock.

http://kb.vmware.com/kb/2090740

Thankfully only the really old versions of vSphere ESX are affected…..

vSphere ESXi/ESX Hypervisor

  • ESXi 4.0, 4.1, 5.0, 5.1, and 5.5 are not affected because these versions use the Ash shell (through busybox), which is not affected by the vulnerability reported for the Bash shell.
  • ESX 4.0 and 4.1 have a vulnerable version of the Bash shell.

Given how serious this vulnerability is, VMware are actually going to roll out a security patch for ESX 4.0 and 4.1 even though they are out of general support.

It is also worth noting that all VMware products currently shipped as a virtual appliance (usually a SLES VM) have the affected version of Bash installed. These virtual appliances will be updated in the near future.

I would recommend all VMware customers to keep an eye out for updates that will address the Shellshock vulnerability!

Intermittent NFS All Paths Down on ESXi 5.5 U1 upgrade

Those of you using NFS storage and planning to upgrade to the latest version of vSphere – 5.5 U1 – please hold off your upgrades as there is a bug within the code which is currently causing issues on paths to NFS volumes.

The bug causes the intermittent loss of connectivity, which can lead to an “All Paths Down” error to your NFS storage! During the disconnects VMs will appear frozen and the NFS datastores may be greyed out. This appears to impact all storage vendors and all environments on 5.5. U1 accessing NFS…..!!
Obviously the loss of a path will impact IOs from VMs to datastores…… and this can result in BSODs for Windows VMs and filesystems becoming read only for Linux VMs (or even kernel panics)!

The recommendation at this point is not to upgrade to vSphere 5.5 U1 and stay on vSphere 5.5 GA. If you have upgraded to 5.5 U1 then you may need to downgrade back to 5.5GA.

More information can be found here:
http://kb.vmware.com/kb/2076392

I suggest you subscribe to the KB in order to get an update as to when this bug is resolved.

 

Have a look at William Lam’s blog regarding setting up alarms within vCenter Server that could help alert when these APD issues occur:
http://www.virtuallyghetto.com/2014/04/how-to-create-vcenter-alarm-to-alert-on-esxi-5-5u1-nfs-apd-issue.html

 

Obviously the main reason for upgrading to 5.5 U1 was to patch the Heartbleed vulnerability within OpenSSL, VMware are informing customers not to upgrade but to install security patches to address the Heartbleed vulnerability…. More info on this process can be found here:
http://kb.vmware.com/kb/2076665

DRS Invocation Not Completed

So came across an issue with my VMware cluster today where vCenter was overloading one of my hosts…..

Had a quick look at the cluster and under the summary tab it displayed the following error:

Image

Hmmm……. upon further inspection it turns out that the hosts weren’t displaying their utilisation properly – both CPU and memory displayed 0% utilisation:

Image

Turns out that DRS may have got its knickers in a twist and it wasn’t able to load balance across the cluster, possibly because it could not contact the other two hosts to determine the available resources…. as you can see from the picture above, everything ended up on my second ESXi host!

A quick google shows that I’m not alone in experiencing this issue…. but there didn’t seem to be any reference to VMware KBs or an official line from VMware regarding this issue.

Turns out most people just ‘disconnected’ and ‘reconnected’ the offending host and it fixed the issue! I did just that and it cleared the DRS configuration issue and started to display the host resource utilisation again….. and after a while DRS kicked in and re-balanced my cluster!

Image

 

I can only assume something caused the DRS or HA configuration of the cluster to go a bit funny – how or what caused it, I’m not sure……

 

Disconnecting and reconnecting an ESXi host is non-disruptive and doesn’t turn off VMs, all it does is remove the HA agent from the host and un-protect the VMs, and then re-enable the HA agent and re-protect the VMs.

I’m sure VMware are aware of this issue, but given it hasn’t been addressed in the latest release of vCenter Server (5.1.1b Aug 1st 2013), I can only assume they are none the nearer of discovering what’s causing the issue!