Cannot connect to vCenter Server via vSphere Client – timeout

I’ve been upgrading my company’s solution centre to vSphere/vCenter 6.0 update 2 the past week and noticed that I was having issues logging into the vCenter Server Appliances I had deployed.

It was a strange issue because I could log into the Windows vCenter Server I had deployed in my primary cluster, but couldn’t log into the vCenter Server Appliance I had deployed in my secondary cluster….. hmmm…. Web Client worked fine for both, but it was the vSphere C# client that was timing out for the vCSA!

vc01.jpg

After much head scratching and trawlling through logs (Found at C:\Users\username\AppData\Local\VMware\vpx\viclient-x-0000.log), it turns out the problem is with the default time out value of the vSphere client for authentication.

The default timeout value is 30 seconds, and my suspicion is that the vCSA was taking slightly longer to respond to authentication…. changed the value to 60 seconds and it all worked fine!

Fire up vSphere Client and connect to another vCenter Server or ESXi host, then click Edit->Client Settings. Change the Client-Server Command Timeout value to Use a custom value and the Timeout in seconds to 60.

vc02

Here’s the VMware KB article about timeout values: https://kb.vmware.com/kb/2072539, there’s also instructions on how to edit the Windows registry if you can’t bring up vSphere client.

Just for the sake of it, here’s the error log:

[viclient:Error :P: 3] 2016-09-06 10:12:35.520 RMI Error Vmomi.SessionManager.Login - 4
<Error type="VirtualInfrastructure.Exceptions.RequestTimedOut">
 <Message>The request failed because the remote server 'xxxxx' took too long to respond. (The command has timed out as the remote server is taking too long to respond.)</Message>
 <InnerException type="System.Net.WebException">
 <Message>The command has timed out as the remote server is taking too long to respond.</Message>
 <Status>Timeout</Status>
 </InnerException>
 <Title>Connection Error</Title>
 <InvocationInfo type="VirtualInfrastructure.MethodInvocationInfoImpl">
 <StackTrace type="System.Diagnostics.StackTrace">
 <FrameCount>17</FrameCount>
 </StackTrace>
 <MethodName>Vmomi.SessionManager.Login</MethodName>
 <Target type="ManagedObject">SessionManager:SessionManager [xxxxx]</Target>
 <Args>
 <item></item>
 <item></item>
 <item></item>
 </Args>
 </InvocationInfo>
 <WebExceptionStatus>Timeout</WebExceptionStatus>
 <SocketError>Success</SocketError>
</Error>
[viclient:Critical:M: 6] 2016-09-06 10:12:35.531 Connection State[xxxxx]: Disconnected
[viclient:SoapMsg :M: 6] 2016-09-06 10:12:35.532 Attempting graceful shutdown of service ...
[viclient:SoapMsg :M: 6] 2016-09-06 10:12:35.534 Pending Invocation Count: 0
[viclient:SoapMsg :M: 6] 2016-09-06 10:12:35.535 Graceful shutdown of service: Success
[ :Error :M: 6] 2016-09-06 10:12:35.543 Error occured during login
VirtualInfrastructure.Exceptions.LoginError: The server 'xxxxx' took too long to respond. (The command has timed out as the remote server is taking too long to respond.)
 at VirtualInfrastructure.LoginMain.Process(BackgroundWorker worker, DoWorkEventArgs e)
 at VirtualInfrastructure.LoginWorkerImpl.Worker_DoWork(Object sender, DoWorkEventArgs e)
...
 at System.ComponentModel.BackgroundWorker.WorkerThreadStart(Object argument)
 VirtualInfrastructure.Exceptions.RequestTimedOut: The request failed because the remote server 'xxxxx' took too long to respond. (The command has timed out as the remote server is taking too long to respond.)
 at VirtualInfrastructure.Soap.SoapServiceWrapper.DoInvokeSync(ManagedObject mo, MethodName methodName, Object[] parameters, Int32 timeoutSecs)
 at VirtualInfrastructure.Soap.SoapTransport.VirtualInfrastructure.Transport.InvokeMethod(ManagedObject mo, MethodName methodName, Object[] pars)
 at VirtualInfrastructure.ManagedObject.InvokeMethod(MethodName methodName, Object[] pars)
 at Vmomi.SessionManager.Login(String userName, String password, String locale)
 at VmomiSupport.VcServiceImpl.LoginNormally(LoginSpec loginSpec)
 at VmomiSupport.VcServiceImpl.Login(LoginSpec loginSpec)
 at VirtualInfrastructure.LoginMain.Process(BackgroundWorker worker, DoWorkEventArgs e)
 System.Net.WebException: The command has timed out as the remote server is taking too long to respond.

 --- End of inner exception stack trace ---

VMware Marketing Error?!?

A work colleague of mind has just noticed that there seems to be less features listed under vSphere Enterprise Plus Acceleration Kit than on the vanilla vSphere Enterprise Plus licenses……

I really didn’t notice and was aghast to see it was true:

http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-vSOM-Pricing-Whitepaper.pdf

vsom1vSOM2vSOM3

I really hope this is a HUGE marketing error and not actually true! TBH, it looks like whoever put together the whitepaper has mistaken the old Enterprise AK for the Enterprise Plus AK…. and just changed the name…. lol….

Although if it is true a lot of people are going to be very annoyed! =)

No coredump target has been configured

So recently a number of customers have been experiencing a core dump target error after rebooting their ESXi hosts…. quite strangely I also recently experienced the same issue when my demo environment went down a few weeks ago due to a power failure.

coredump1

There isn’t really a clear explanation why this happens, but it seems to be a common occurrence with end-users… it’s also quite simple to fix, but the KB isn’t exactly the clearest of instructions: http://kb.vmware.com/kb/2004299

Firstly enable SSH on the host experiencing the error:
ssh1 ssh2

Next, open a putty session to the host and login as root.

Check to see if there is currently an active diagnostic partition using the following esxcli command:
esxcli system coredump partition get
Check to see if there are any available diagnostic partitions by running the following command:
esxcli system coredump partition list

It’s more than likely you would get a similar output as below:
coredump2

Usually the coredump partition is configured on the boot device. We now need to find the boot device and the diagnostic partition. Run the following command to list all the storage devices attached to the host.
ls /dev/disks/ -l
or ls /vmfs/devices/disks/ -l
Usually the boot device can be easily identified because it would be the only device with multiple partitions:
coredump3

(If you want to understand more about partitions that are created by ESXi, have a look at this KB: http://kb.vmware.com/kb/1036609)

Once you have the device ID, run the following command to display the partition table for the device:
partedUtil getptbl “/dev/disks/DeviceName”
coredump4

Usually the partitions will be labelled and you can easily identify the coredump partition – labelled “vmkDiagnostic” this is quite often the 7th partition. If you’re unfortunate and don’t have labelled partitions, then usually you can identify the diagnostic partiton from the GUID displayed – this is usually “9D27538040AD11DBBF97000C2911D1B8”

Once you’ve identified the partition, you will have to re-point the coredump target to this partition.

To configure and activate a specific partition, use the command:
esxcli system coredump partition set –partition=”Partition_Name”
esxcli system coredump partition set –enable true

To automatically select and activate an accessible diagnostic partition, use the command:
esxcli system coredump partition set –enable true –smart

If the partition cannot be automatically set, you may have to deactivate the previous partition link and re-running the command, as follows:
coredump5

Once done, double check the core dump partition has been configured by running the following command:
esxcli system coredump partition get

If all is successful, reboot the host to complete the configuration and to ensure the partition is stored after rebooting.

Issues using Windows Session Credentials with vSphere Client and vCenter Server Appliance

So it seems there’s a known bug when using vSphere Client to log into your vCenter Server Appliance….. it actually affects vCenter Server Appliance 5.1, 5.5, and 6.0.

If you try to log in to vCenter Server by checking the “Use Windows Session Credentials”, it bombs out with a General System Error as follows:
vcsavcsa1

Looking into the vpxd.log from the Web Client Log Browser, you will be able to see the following errors:
vcsa2

(Note: filter the vpxd.log using the time you tried to log in)

You can also view the vpxd.log file by logging into the console of the vCSA, enabling shell and navigating to /var/log/vmware/vpxd/

In there, you will see entries similar to:

<YYYY-MM-DD>T<TIME>+02:00 [7F1C10CCC700 error ‘GSSAPI’ opID=CEAEA705-00000004-2d] Cannot get user info for domain\user. Possible NSS configuration problem.
<YYYY-MM-DD>T<TIME>+02:00 [7F1C10CCC700 info ‘commonvpxLro’ opID=CEAEA705-00000004-2d] [VpxLRO] — FINISH task-internal-9727699 — — vim.SessionManager.loginBySSPI
<YYYY-MM-DD>T<TIME>+02:00 [7F1C10CCC700 info ‘Default’ opID=CEAEA705-00000004-2d] [VpxLRO] — ERROR task-internal-9727699 vim.SessionManager.loginBySSPI: vmodl.fault.SystemError:
Result:
(vmodl.fault.SystemError) {
dynamicType = <unset>,
faultCause = (vmodl.MethodFault) null,
reason = “Cannot get user info”,
msg = “”,
}

(Note: I ran a grep against the vpxd.log file looking for GSSAPI)

Solution

To work around this issue, manually enter user credentials instead of using the User Windows session credentials option.

Alternatively, to resolve this issue:

  1. Log in to vCenter Server Appliance as the root user.
  2. For vCenter Server Appliance 6.0 you need to enable the Bash shell in order to access the linux OS, to enable the Bash shell, run the shell.set –enabled True command.
  3. Open the /etc/nsswitch.conf file using a text editor (i.e. VI)
  4. Locate the passwd: compat ato entry and replace it with passwd: compat ato lsass.
    Note: Remove lsass from the line if it is currently displayed
  5. Restart the services using /etc/init.d/vmware-vpxd restart.

You can read more in the KB here: http://kb.vmware.com/kb/2050701

vCenter Operations Manager – SSL Certificate issues

So during a recent deployment of vCenter Operations Manager (5.8.2) at a customer site I encountered the following error whilst trying to pair the vCOPs vApp to their vCenter Server.

vcops ssl cert

“Unable to get vCenter Server certificate chain”

This was the first time I had encountered this issue deploying vCOPs, fortunately given how much exposure I got to SSL certifications during a previous project I knew it could be down to one of 2 things….. either the SSL certificate had expired, or that it was not generated with the correct parameters.

Note: Quickest way to look at a vCenter Server’s SSL certificate is to just open a browser and point it at the vCenter’s IP address, then view the certificate…..
vcops ssl 1 vcops ssl 2
(Left – IE, Right – Chrome)

or if it’s a Windows deployment of vCenter 4.1 or later, you can find the certificate here: C:\ProgramData\VMware\VMware VirtualCenter\SSL\rui.crt (Note that c:\ProgramData is a hidden folder!).

 

It seemed that the SSL certificate was valid (expiry date was 2022), however I noticed that the public key certificate was weak as the key length was only 512 bits!!
What had happened was a previous partner had upgraded them from VI3.5 to vSphere 4.0 to vSphere 5.0 and had forgotten to re-generate the SSL certificates!
Prior to vCenter Server 4.1, by default VMware self-signed their SSL certificates with a public key length of 512 bits! So when they upgraded they kept the same SSL certificates.

Post vCenter Server 4.1, if you installed from scratch the public key length is set to RSA 2048 bits by default.
vcops ssl 3
So because the public key length was only 512 bits, vCOPs could not authenticate the vCenter Servers’ certificate (I believe it has to be a minimum of 1024 bits)!
More info from VMware’s KB here: http://kb.vmware.com/kb/2037082 and Microsoft’s KB here: http://support.microsoft.com/kb/2661254

 

As it was a production environment and they couldn’t afford to regenerate their SSL certificates, I had to ‘inject’ the vCenter Server certificate into the vCOPs VMs keystores as follows:

  1. Copy the rui.crt file (the SSL certificate) on the vCenter Server into the tmp drive of the vCOPs UI VM. (This can be easily achieved using WinSCP).
  2. Login to the console of the UI VM as root.
  3. Change to the directory where the certificate keystore is located: /usr/lib/vmware-vcops/user/conf
  4. Issue this command to add the vCenter Server certificate to the certificate store: keytool -importcert -file /tmp/rui.crt -alias https://<VC FQDN or IP>/sdk -keystore truststore -storepass oxygen
  5. Issue this command to verify that the certificate is in the certificate store: keytool -list -keystore truststore -storepass oxygen
  6. Issue this command to copy the truststore file from the UI virtual machine and paste it to the Analytics virtual machine: scp truststore secondvm-external:/usr/lib/vmware-vcops/user/conf/
  7. Restart all services with the su – admin -c “vcops-admin restart” command, or reboot the vApp from the vCOPs admin page.

Once the SSL certificate was injected into the vCOPs VMs keystore it was plain sailing and we could continue with the setup wizard.

 

Ideally if you still have weak certificates in your environment, you should really be replacing them by generating new SSL certs! =)