Configuring MSCS (MicroSoft Clustering Service) in the VMware world is a complicated process. I’ve setup many MSCS solutions on VMware, and I still cringe when a customer demands it as a solution. It works, but every time I do it there are always so many little challenges.
I’ll try to describe what creates the most common misunderstanding, as best I see it. Keep in mind, this advice is for ESX 3.x. I haven’t looked too closely at how vSphere4 handles it, but I don’t think it’s that different. Also, I’m very willing to be corrected if you think I’m misrepresenting things.
There are 2 different settings, which sound very similar:
- Disk types (selected when you add a new disk) – VMDK, virtual RDM (virtual compatibility mode) or Physical RDM (physical compatibility mode)
- SCSI bus sharing setting – Virtual sharing policy or Physical sharing policy (or none)
They are distinct, and just because you chose a Virtual RDM, doesn’t mean the SCSI controller should necessarily be set to Virtual .
Let’s deal with the disks first. I stand by the table on my reference card. The critical deciding factors are the host configuration, need for snapshots and if you need SCSI target software to run. The hosts can either be:
- Cluster in a box (CIB) – both MSCS servers are VMs running on the same ESX host
- Cluster across boxes (CAB) – both MSCS servers are VMs running on different ESX hosts
- Physical and VM (n+1) – one server is running natively on a physical server, the other is in a VM
Now the SCSI bus sharing setting is different. It often gets missed, because you don’t manually add the second controller (in fact you can’t). You need to go back to the settings after you have added the first shared disk. There are 3 settings here:
- None – This is for disks that aren’t shared between VMs (not the same as ESX hosts sharing VMFS volumes). This is used for the disks which aren’t shared in the cluster, e.g. the VMs boot disks. This is why shared disks have to be on a 2nd SCSI controller.
- Virtual – only for CIB shared disks
- Physical – For CAB and n+1 shared disks
So, the problem can really lie in two areas:
- It’s easy to forget to change the SCSI bus sharing mode, as its not something you have to select. So this often get left as None for the shared disks.
- If you want a virtual RDM, you choose virtual SCSI mode if you are doing CIB (which is not recommended by VMware). If you are doing CAB or n+1 with a virtual RDM, you must choose physical SCSI mode .
Here is the latest 3.5 PDF for MSCS:
http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_mscs.pdf
Add to the mix, you need to understand Boot from SAN, Independent disks, Persistent/Nonpersistent, VMDK disk types, e.g. eagerzeroedthick & additional SCSI controllers. And its always changing; back in the days of ESX2, they called things pass-though and non-pass-through RDMs. This is just to setup the hardware, wait until you have to configure the disks and cluster!
It’s definitely a rats nest, but I don’t blame VMware. MSCS is a fairly complex beast, and is very touchy when it comes to its shared storage. I’m sure VMware provided MSCS because its customers demanded it, but you can tell they certainly don’t want to promote its use. Hopefully, the new Fault Tolerance features will draw most architects away from MSCS.
But, in the begining, with some of the limitations of FT, I think that it will be not so used.
I cannot get cluster-in-a-box working with ESX or ESXi 3.5 Update 3 Build 123629 if any snapshots exit in the clustered VMs. The main issue seems to be that “SCSI Bus Sharing” cannot be set to “Virtual” if snapshots exist in a VM. But, your image above implies that snapshots are possible with cluster-in-a-box. …? I’ve got to be missing something … or this was changed between ESX 3.01 and ESX 3.5.
I have posted this in the http://communities.vmware.com/message/1048845#1048845 VMware forum.
Hi SquareVM,
The table shows that you can have snapshots with vmdk or virtual RDM, but not physical RDMs. It also says that you can do a cluster in a box only with vmdk. It doesn’t say you can do snapshots with CIB.
When you snapshot a VM, it make changes in that VM’s configuration files to re-point to the disk’s snapshot. However the 2nd VM in the cluster would still be pointing to the original disk not the snapshot version. This would obviously cause serious issues.
I’ve never tried snapshots with MSCS VMs, but I’d be surprised if worked. VMware have a long list of constraints with MSCS. Some can be worked around (won’t be supported though), but many can’t.
Hey Forbes Guthrie,
Sorry for misinterpreting your image about using snapshots with cluster-in-a-box – which seems to not be supported in 3.5. I used to use snapshots with CIB with ESX 3.01 servers. It was a **great** way to get a *development* cluster environment back to a ‘clean’ state (ie. no other software than Microsoft; registry mess, file cleanup, etc.) fast to test my own beta software or perform competitive analysis on other software.
I guess I’m out of luck with ESX and ESXi 3.5. 😐 I have tried to ‘see’ what removing a snapshot from a VM does to the VM’s config files and then manually simulate snapshot removal without actually removing the snapshot files – to try and hack around this virtual drive setting limitation and use snapshots with CIB, but my manual edits of the VM configs and rename (not delete) of snapshot files do not seem to have an affect. 😐
I have not tried 4.0 yet.
Hi SquareVM,
I’ve been having a bit of a dig around 4.0, but I can’t find anything relating to snapshots. Nothing to say if they are, or aren’t supported. I suspect that they aren’t, for the same reasons I stated above. It’s probably a check that added around the 3.5 time-frame, because people were snapshoting the disks, and then making support calls when it broke things. I’m booked for VMworld this year, so if I bump into any MSCS gurus, I’ll be sure to ask and find out the official VMware stance.
Hi, i need to build a cluster solution to a a server with tomcat, i have two servers with ESX 4 and I am thinking to build a Cluster with MSCS with one node on each ESX with a shared disk in a SAN. I can not find any Doc about Tomcat and MCSC, i think tomcat is clusteraware but with some java app. not with MSCS. how can i cluser with MSCS? as a Generic Service?
Hi George, I couldn’t find anything relating to tomcat servers and MSCS, but here is a guide to setting up tomcat with NLB: http://blog.paulmcgurn.com/2008/09/tomcat-clustering-on-windows-server.html
Hi,
we try to setup a CAB with a virtual RDM. This works fine – the only think what causes problem is, that we cannot create snapshots from our images anymore, although we excluded the disk from the snapshot processing.
We are using ESX 3.5 – is this maybe a version problem? Is this setup only working as described in the table above for ESX 3.0?
Any comment would be appreciated.
Thanks Josi
Hi Josi, the table above describes 2 different things:
– you can create snapshots with virtual RDMs
– you can create CAB with virtual RDMs
However it doesn’t state that you can use snapshots with disks that are used in MSCS setups. If you check the MSCS documentation, snapshots is just one of the things which is not supported on MSCS VMs.
Hi Forbes, Your table above lists a cluster across boxes with physical rdms as not recommended, yet the vmware mscs guide that you reference uses physical compatability mode RDMS in its cluster across boxes chapter (ch 3, pg 28). Can you explain your reasoning behind not recommending this configuration?
Hi fcorrao,
With VI3.5, the virtual RDMs have some advantages like snapshots and potential to use VMotion. There used to be a paper from VMware that stated they recommended virtual mode unless you had a good reason not to (like SAN tools), however I can’t find it right now.
Interestingly though, with the move vSphere 4, VMware now recommend using Physical RDMs for CAB.
Forbes.
Hi Forbes
If we use phisical RDMs and bus sharing enabled for MCSC(across ESXs), we cannot VMotion VM on ESX3.5/4.0. But I heard that it was able to VMotion VM on ESX3.0.x even if it was used phisical RDMs. Is this correct? If yes, why does not it made? The reason?
I’m afraid my memory of this isn’t that clear, but even if it was technically possible its not something I’d ever recommend.
Hi Forbes,
An additional question to verify how the SAN can interact with all this.
We are currently in the migration from a 2-node physical cluster to a 2+1 node (virtual-physical) on our VMWare ESX3.5 U3 infrastructure.
The LUN’s of the current MSCS-cluster are created on a Logical SAN Storage Group where only the 2 physical hosts are connected. The Storage Groups for our Virtual infrastructure will be used to connect all LUN’s. Currently only ESX-hosts have access to this Storage group, but my question is: How will the LUN’s (all, including VMFS volumes and RDM) react when also Windows physical machines will be allowed to that same Storage group and see the different LUN’s?
Hope you can help me with this or give me some insight on how to interpret this situation.
Hi Dirk,
Firstly, I don’t think VMware’s MSCS supports more than 2 nodes in the cluster, even if only 2 are virtual (I’m not saying it won’t work, just won’t be supported).
I definitely wouldn’t allow any Windows servers to see your regular VMFS datastores. Windows servers have a nasty habit of “initializing” LUNs formated with other file systems. If you want to do this, I would create a special Storage Group just for this MSCS’s shared disks (guessing by the name, if this is an EMC Clariion, you can have your ESX server in multiple Storage Groups), and only allow the 2 Windows servers and 1 ESX server to see those LUNs. The ESX server can still be presented other VMFS volumes via other Storage Groups for the local disks.
Hope you understand what I’m getting at here, but let me know if you don’t.
This won’t work for multiple reasons.
As mentioned, snapshots would not be equal for both nodes. In addition, you can’t get a SCSI-2 reservation on a delta file (based on how ESX works), and this you’d cause a cluster failover (or worse, a total failure if you tried to do both nodes). This is why vSphere does not let you snapshot disks on a shared scsi bus at ALL.