So one of the BIG problems at the moment is that SRM does not fully support protecting your vCloud environment.
It supports protecting your management cluster (so the vCenter servers, vCD cells, vCNS manager, vCM, DBs, etc), but it doesn’t yet protect your resource cluster….. so all those VMs you’ve deployed in your organisations under vCD – well they’re not protected by SRM!
Definitely NOT COOL if your primary site goes tits up!!
From what I can gather, this is mainly due to the way SRM work….. When you setup SRM for DR, you have to ‘pre-create’ resources at the recovery site in order to map the resources from the protected site to them (stuff like resource pools, folders, network, placeholder VMs). Unfortunately vCD likes to have full control of a resource cluster and manages all the resource itself – this basically means that the vCD cells are not aware of the objects that have been created in the recovery site for SRM. It doesn’t matter if the names are the same, what matters is the Management object Reference IDs (MoRef ID) have changed and this is what vCD uses to construct its environment…..
MoRef IDs are used to correlate objects between vCD and the underlying vSphere/vCenter layer. Any changes to these identifiers will result in the loss of functionality because vCD will not be able to manage these objects as it will not be aware of them (ie the MoRef IDs will not exist inside the vCD DB).
The use of SRM would result in a change of the MoRef ID on the vCenter Server layer, resulting in an incorrect reference in the vCD database – and so leaving the object (eg. a VM) unmanageable from a vCD perspective. I believe SRM also re-signatures the storage volumes which will also confuse vCD.
About a year ago Chris Colotti and Duncan Epping wrote an article on how vCloud DR could be achieved, this involved the clever idea of putting the resource ESXi hosts at the recovery site into the same resource cluster as the resource ESXi hosts at the protected site (but in maintenance mode as obviously it won’t see the storage located at the protected site so can’t be used by vCD). Then using vSphere HA to take the ESXi hosts out of maintenance mode to handle the recovered workloads…. However, this solution did involved manual intervention to fail over the vCD resources correctly:
Earlier this year, another white paper was released which described how the majority of this manual process (ie the VMware bits) could be automated using PowerCLI:
However, what’s missing is the automation of the whole storage piece – breaking the replication and making the volumes read/write….. but then I guess this is really more storage-vendor dependent! =)
I guess if the storage vendor has exposed the array to VMware using VASA then it could be possible to script the storage steps as well….! =)
Anyways, it’s been an interesting read…… and definitely a problem I see VMware sorting out for the next release of SRM!
Given how powerful PowerCLI is, I really need to find some time to learn how to use it!!