vCenter Server Appliance – filesystem out of space

So it’s all happening this week with this upgrade/clean up of the MTI solution centre!! =)

Upon finishing all the upgrades and reconfiguring vSphere Replication and Site Recovery Manager, I noticed the DR vCSA was a bit unresponsive…. taking ages to log into Web Client (sometimes it didn’t even get that far) – signing into the VAMI, I noticed that there was a critical error regarding the log file.

vcsa01

If you weren’t aware, one of the changes to vCSA with 6.0 was the deployment of 11 VMDKs with the appliance, one for each component service of vCenter. In previous versions there were only 2 virtual disks and this proved problematic when trying to increase disk capacity for particular components of vCenter Server (ie if you only wanted to increase the log directory).

As the vCSA was running in a demo environment, I decided to only do a ‘Tiny’ install – and it turns out that vCSA just ran out of capacity for logging – a quick jump onto the console proved this to be true:

vcsa02

The following VMware KB provides details into the 11 VMDKs and what mount points are attached to each vdisk: https://kb.vmware.com/kb/2126276.

vcsa04

I followed the instructions to increase the capacity of the log vdisk (VMDK5) and then gave the vCSA a reboot…..

vcsa03

The vCSA is now healthy and back to normal. =)

As a footnote, here’s a VMware KB that explains how to increase he maximum backup size and index of the vCSA to try and resolve he issue of the log directory fill up: https://kb.vmware.com/kb/2143565

VM displays to two port groups after migrating to a distributed vSwitch

Do you ever encounter the same issue over and over again and forget how you solved it the first time, or what caused it to happen in the first place?!?

It happened to me today on a customer site….. we were migrating a couple hundred VMs off a standard vSwitch onto a newly created Cisco Nexus 1000v switch and several VMs were showing to be connected to the old port group on the standard vSwitch as well as the new port group on the distributed vSwitch.
The VM Summary showed connections to 2 port groups even though there was only 1 vNIC and what made it more confusing was if you went and edited the VM hardware settings the vNIC was configured with the correct port group….. hmmm…..

It was a case of Déjà vu…. I sat there thinking “Damn, I’ve seen this before… but for the life of me I can’t remember what’s causing the problem!”

After a quick “Google”, I stumbled across the following KB and my foggy brain suddenly cleared!!
http://kb.vmware.com/kb/2008231

……If the virtual machine has snapshots associated with the old network, when you reconfigure the virtual machine to use a new network configuration, both old and new networks are associated with the virtual machine……”

“Doh…. Snapshots!!!”……. it’s all down to the VMs possibly having a snapshot attached to it!!

 

Basically someone had taken a few snapshots of the affected VMs, and because it’s a “snap shot” of the VMs state at that point in time, the snap shot had the old vSwitch port group whilst the new ‘delta’ VMs had been re-configured with the new 1000v port groups!
Kinda makes sense right? If you wanted to roll back to the old snapshot then it must have the configuration of the VM from the point in time when the snapshot was taken – ie with the old vSwitch port group!

Anyways, a quick check with the client and after a lengthy consolidation process all the VMs were happily only showing connections to the 1000v port group!

 

….. and so to make sure I don’t forget that I’ve encountered this issue millions of times before, I’ve decided to post up a blog entry about it….
(Which was one of the reasons I started blogging in the first place – a place where I can not only share my knowledge but also somewhere I can access solutions to past problems!)

Troubleshooting a vCloud Director Installation

The problem about working full time is it’s really hard to find time to blog, and also to find topics to blog about! =)

One of the great things about my job is we have a solution centre in the office which allows me to play around with kit! =)
Our solution centre is based around an EMC VSPEX architecture….. so EMC VNX storage, Cisco UCS blades and VMware virtualisation!!

I’ve been busy the last week or so putting together a vCloud solution for some of the engineers to play around with, as well as finally completing the detailed installation guide for deploying the vCloud Suite (one of these days I promise I will post it up).

Anyways, so I ended up installing two RHEL 6.2 VMs as my vCD cells on a MS SQL 2008 R2 DB, load-balanced using a vCNS edge….. but when I tried to start the vCD services on my linux VMs, they would say they’ve started (simple service vmware-vcd status command) but wouldn’t give me the vCD web console/UI….. all I got was a Blank Grey Webpage and after a while it would error out saying it couldn’t connect to the website!! Hmmmm……

Anyways, this gave me a good opportunity to test out my troubleshooting skills and offer a topic for my blog! =)

So here goes……

Troubleshooting vCD….

The Log files for vCloud Director are located at /opt/vmware/vcloud-director/logs. There are three main files to look at (well there’s more than 3 but these are the ones I usually use and 99% of the time I can work out what’s wrong):

1. cell.log

This log file provides information on the status of the vCloud Director cell services and the application as it starts up.
Use tail -f cell.log to view the live status when starting a vCloud Director Cell.
A successful start up will allow you to access the vCD web-console/UI and will display a started status for each service, plus 100% for Application Initialization.
Image

Usually if there is an issue with accessing the web-front end UI then it is more than likely that the services are still waiting to complete, as below:

Image

If you’re seeing lots of services showing a “WAITING” status, then check the other logs to determine what could be causing this issue.

2. vmware-vcd-watchdog.log

This log file shows any alerts, errors or information that the vCloud Director cell services maybe experiencing. A healthy vmware-vcd-watchdog.log looks similar to the below:

Image

If there’s an issue, then you could get an ‘Alert’ entry, similar to the one below:

Image

I believe vCloud Director will automatically try to re-start the services as I didn’t see a time stamp for an entry when I manually restarted the service. Also this log looks very similar to what you would get if you typed in ‘service vmware-vcd status‘ as that command reports on both the vmware-vcd-watchdog and vmware-vcd-cell services.

3. vcloud-container-info.log

This log file shows the status of the initial installation of vCloud Director and will log how the application is currently functioning. If you have any errors or failures during installation, this log file will provide you with the details required to troubleshoot the cause of the failure.
In addition, this log will also provide information on any errors that may cause the vCloud Director services to fail to start.
In my case, after doing a cat vcloud-container-info.log | more I discovered the following error:

Image

Turns out that the error shows that the vCloud Director cell could not resolve its hostname in DNS.

When I went through the pre-reqs before installation, I realised that I had only put in DNS entries for the two IPs used for the HTTP and the Remote Console access….. I forgot to put an entry into DNS that resolved the hostname of the Linux VM to the HTTP IP address.
A quick edit to DNS and then a restart of the vCD services fixed the problem I experienced.

4. vcloud-container-debug.log

This log file shows the debugging information. The detail in this log file will be dependant upon the level of debugging set. I didn’t actually end up looking at this log as the error was discovered in the -info.log…. However, it’s another port of call if you can’t work out what’s causing your vCD services to fail.

Rights….. blog entry over…… I’m off to eat my dinner! =)