Spectre & Meltdown Update

So it seems that the microcode patches released by VMware associated with their recent Security Advisory (VMSA-2018-0004) have been pulled….
So that’s ESXi650-201801402-BG, ESXi600-201801402-BG, or ESXi550-201801401-BG.

The microcode patch provided by Intel was buggy and there seems to be issues when VMs access the new speculative execution control mechanism (Haswell & Broadwell processors). However, I can’t seem to find much around what these issues are…

For the time being, if you haven’t applied one of those microcode patches, VMware recommends not doing so and to apply the patches listed in VMSA-2018-0002 instead.

If you have applied the latest patches you will have to edit the config files of each ESXi host and add in a line that hides the new speculative execution control mechanism and reboot the VMs on that host. Detailed information can be found in the KB above.


Finally William Lam has created a very handy PowerCLI script that will help provide information about your existing vSphere environment and help identify whether you have hosts that are impacted by Spectre and this new Intel Sighting issue: https://www.virtuallyghetto.com/2018/01/verify-hypervisor-assisted-guest-mitigation-spectre-patches-using-powercli.html

Spectre & Meltdown Vulnerabilities

So at the beginning of the new year, news broke via The Register that there could be a potential security vulnerability to Intel processors (Meltdown) and how it was a problem which couldn’t be easily fixed by a microcode update because of how the Intel architecture does speculative execution of code (in a nutshell this is how modern processors try to ‘predict’ the code it needs to execute next, before the current executing code produces a result – all modern processors do this to some extent in order to fill its internal pipeline and speed up processing)…. this quickly blew up into a storm where additional vulnerabilities were identified (Spectre) which affects Intel, AMD and ARM processors!

Three closely related vulnerabilities involving the exploit of speculative execution in CPUs were made public:

Variant 1 & 2 have been branded as Spectre, with Variant 3 known as Meltdown.

The fallout is spectacular…. lawsuits being filled against Intel…. videos of exploits (proof of concepts) already on youtube….. customers going crazy that Russians/North Koreans could be stealing data from their systems….. all this because chip manufacturers tried to outdo each other by putting speed of processing above security.

The best article I’ve read that explains how Speculative Execution works and how these vulnerabilities could be exploited can be found here: http://frankdenneman.nl/2018/01/05/explainer-spectre-meltdown-graham-sutherland/

It seems that at the moment the only way to minimise your exposure to potential exploits is to patch the OS or Hypervisor, however this isn’t without issues as people have started reporting that it adds an overhead to performance. In all honesty, I doubt personal users will notice a performance hit on their day to day usage (home/office applications or games), it will however impact anyone that undertakes high IO or system-call intensive applications (such as DBs, email, Big-data/data-mining)… a performance hit of between 5-30% depending on application!!

VMware have stated that at present they don’t believe Meltdown to be an issue to their products because ESXi does not run untrusted user mode code, and Workstation and Fusion rely on the protection that the underlying operating system provides. For Spectre, they have released an article detailing their response to the issues and 2 Security Advisories which addresses the vulnerabilities and how they can be mitigated, VMSA-2018-0002 has been superseded by VMSA-2018-0004.

From what I can see, the first Security Advisory consists of security patches to ESXi that addresses the vulnerability to mitigate against leakage from the hypervisor or guest VMs into a malicious guest VM – these were patches made available late last year before the news broke (which makes you wonder how long the industry have known about it).

The second Security Advisory is a full minor update to vCenter (5.5, 6.0 and 6.5) in order to support both newer vSphere ESXi patches and Microcode/BIOS patches to hardware. This seems to be what they call “Hypervisor-Assisted Guest mitigation” which virtualises the new speculative-execution control mechanism for guest VMs so that a Guest OS can mitigate leakage between processes within the VM – and this mitigation requires specific microcode patches from platform vendors which seem to introduce these new ‘speculative-execution control features’. More information on how to apply this Security Advisory can be found here: https://kb.vmware.com/s/article/52085.

Note: The update patches found in VMSA-2018-0004 will mean that these new CPU features will be exposed to Guest VMs and as such vMotion to ESXi hosts without the microcode or hypervisor patches applied will be prevented. However, if you have an EVC cluster, it looks like vCenter will suppress the new features from VMs to enable vMotion compatibility until all hosts have been upgraded (after which it will enable those features) – unpatched hosts will not be allowed to join an EVC cluster that has been patched.

It’s worth noting that Guest VMs should also have their OS updated with the latest security patches for effective mitigation of these known vulnerabilities!

Finally, VMware have released an article regarding these vulnerabilities and whether their virtual appliances are affected: https://kb.vmware.com/s/article/52264. It currently looks like vSphere Integrated Containers and vRealize Automation have not been patched yet.

New VMware Fling – HTML5 Embedded Host Client for ESXi

The clever bods at VMware labs have just released another new Fling… and this one’s a damn awesome Fling!


What it allows you to do (after installing the VIB) is to open up a browser to your ESXi host and gain access to a simple client allowing you to manage your freshly installed ESXi host (for example if you haven’t yet deployed vCenter Server or don’t have access to the C# client). The look and feel of it is very similar to the current Web client…. very simple to navigate and access the features/functions.


The client is still in the development phase, and as such does not open up all the features possible, the VMware engineers have only implemented a hand full of the most important features:

  • VM operations (Power on, off, reset, suspend, etc).
  • Creating a new VM, from scratch or from OVF/OVA (limited OVA support)
  • Configuring NTP on a host
  • Displaying summaries, events, tasks and notifications/alerts
  • Providing a console to VMs
  • Configuring host networking
  • Configuring host advanced settings
  • Configuring host services

Only issue is it’s a VIB which means you need to copy it across to your ESXi host (using a SCP tool like WinSCP).
Then once copied over you need to install the VIB into the kernel (esxcli software vib install -v /tmp/esxui.vib)

Hopefully the next version will be a fully packaged installer that asks for the IP address of your ESXi host and goes off and deploys the VIB automatically!

The best thing about it – it’s HTML5 and JavaScript….. no Flash!!!! =)

Go off, have a try and then offer your feedback or post up any issues you encounter! Hopefully it’ll get rolled into vSphere ESXi in the future! =)

Need to know what version of VMware product relates to what build number?

So I’m always trying to correlate versions of ESXi or vCenter Server to the build numbers…. mainly when I’m running a health check and I need to determine what version of VM hardware or VMware Tools should be deployed for the version the customer is running.

Anyways, I thought I’d share the webpages that I always refer back to.

For VMware ESXi Release and Build numbers, visit: http://www.virten.net/vmware/esxi-release-build-number-history/

For vCenter Server Release and Build numbers, visit: http://www.virten.net/vmware/vcenter-release-and-build-number-history/

To determine what version of VMware Tools you should be running against the version of ESXi you have deployed, visit: http://packages.vmware.com/tools/versions

And finally, to determine what version of VM Hardware against the version of ESXi, visit: http://kb.vmware.com/kb/1003746

ESXi bug – backing up 128GB vdisks and CBT

So I read about this issue a week or so ago when this bug started doing the rounds in the VMware communities and The Register picked up on the issue…. I was planning to blog about it but it slipped my mind due to a busy end of month! >_<”

Anyways, VMware have sheepishly recognised the bug and produced a KB article about it: http://kb.vmware.com/kb/2090639

The bug affects VMs with Changed Block Tracking (CBT) turned on, specifically those VMs that have had its storage (so a single vdisk) increased in size by more than 128GB.
The problem only presents itself when it comes to the execution of the command QueryChangedDiskAreas(). This API call is commonly used by backup softwares to determine what part of a VMs vmdk file has been changed since the last backup in order to execute an Incremental Backup.

It seems that once the vmdk is increased to more than 128GB, you get an inaccurate list of allocated VM disk sectors returned by the API call, and so any sort of incremental backup could be erroneous and some changed blocks may not be captured during backup. Obviously this means that in the case of you restoring from the erroneous backup, you may experience data loss!

This is a known issue affecting VMware ESXi 4.x and ESXi 5.x and currently, there is NO resolution.

To work around this issue, VMware recommends that you disable and then re-enable CBT on the VM. The next backup after toggling CBT will be a full backup of the virtual machine.

The issue here is in order to disable CBT, you need to power off your VM and ensure there are no snapshots attached to the VM…… quite a pain in the rear end!
Info on how to disable and enable CBT can be found here: http://kb.vmware.com/kb/1031873

Also I’m not too sure whether it fixes CBT or whether it will keep generating the same inaccurate info every time the vdisk blocks change and you try to run an Incremental…. unfortunately there isn’t enough information out there yet!
I pity the admin who has to run daily fulls in order to combat this bug….. 128GB backups… ouch!

Fortunately none of my customers have a vdisk of that monstrous size so this shouldn’t affect many of them!

Shellshock Vulnerability

So last week it was reported that a serious vulnerability was discovered in Bash (Bourne-Again SHell) which is pretty much core to a lot of Linux/Unix OSes – including Apple’s MacOS. The bug, dubbed Shellshock, is supposed to be more serious than the previous OpenSSL Heartbleed vulnerability that was discovered earlier this year. It allows hackers to remotely take control of any system running Bash!

VMware have now released a KB that explains which hypervisors are affected by Shellshock.


Thankfully only the really old versions of vSphere ESX are affected…..

vSphere ESXi/ESX Hypervisor

  • ESXi 4.0, 4.1, 5.0, 5.1, and 5.5 are not affected because these versions use the Ash shell (through busybox), which is not affected by the vulnerability reported for the Bash shell.
  • ESX 4.0 and 4.1 have a vulnerable version of the Bash shell.

Given how serious this vulnerability is, VMware are actually going to roll out a security patch for ESX 4.0 and 4.1 even though they are out of general support.

It is also worth noting that all VMware products currently shipped as a virtual appliance (usually a SLES VM) have the affected version of Bash installed. These virtual appliances will be updated in the near future.

I would recommend all VMware customers to keep an eye out for updates that will address the Shellshock vulnerability!

Intermittent NFS All Paths Down on ESXi 5.5 U1 upgrade

Those of you using NFS storage and planning to upgrade to the latest version of vSphere – 5.5 U1 – please hold off your upgrades as there is a bug within the code which is currently causing issues on paths to NFS volumes.

The bug causes the intermittent loss of connectivity, which can lead to an “All Paths Down” error to your NFS storage! During the disconnects VMs will appear frozen and the NFS datastores may be greyed out. This appears to impact all storage vendors and all environments on 5.5. U1 accessing NFS…..!!
Obviously the loss of a path will impact IOs from VMs to datastores…… and this can result in BSODs for Windows VMs and filesystems becoming read only for Linux VMs (or even kernel panics)!

The recommendation at this point is not to upgrade to vSphere 5.5 U1 and stay on vSphere 5.5 GA. If you have upgraded to 5.5 U1 then you may need to downgrade back to 5.5GA.

More information can be found here:

I suggest you subscribe to the KB in order to get an update as to when this bug is resolved.


Have a look at William Lam’s blog regarding setting up alarms within vCenter Server that could help alert when these APD issues occur:


Obviously the main reason for upgrading to 5.5 U1 was to patch the Heartbleed vulnerability within OpenSSL, VMware are informing customers not to upgrade but to install security patches to address the Heartbleed vulnerability…. More info on this process can be found here:

DRS Invocation Not Completed

So came across an issue with my VMware cluster today where vCenter was overloading one of my hosts…..

Had a quick look at the cluster and under the summary tab it displayed the following error:


Hmmm……. upon further inspection it turns out that the hosts weren’t displaying their utilisation properly – both CPU and memory displayed 0% utilisation:


Turns out that DRS may have got its knickers in a twist and it wasn’t able to load balance across the cluster, possibly because it could not contact the other two hosts to determine the available resources…. as you can see from the picture above, everything ended up on my second ESXi host!

A quick google shows that I’m not alone in experiencing this issue…. but there didn’t seem to be any reference to VMware KBs or an official line from VMware regarding this issue.

Turns out most people just ‘disconnected’ and ‘reconnected’ the offending host and it fixed the issue! I did just that and it cleared the DRS configuration issue and started to display the host resource utilisation again….. and after a while DRS kicked in and re-balanced my cluster!



I can only assume something caused the DRS or HA configuration of the cluster to go a bit funny – how or what caused it, I’m not sure……


Disconnecting and reconnecting an ESXi host is non-disruptive and doesn’t turn off VMs, all it does is remove the HA agent from the host and un-protect the VMs, and then re-enable the HA agent and re-protect the VMs.

I’m sure VMware are aware of this issue, but given it hasn’t been addressed in the latest release of vCenter Server (5.1.1b Aug 1st 2013), I can only assume they are none the nearer of discovering what’s causing the issue!