I have spent the past week collecting as much information as I can about MSCS (Microsoft clustering) on VMware using a iSCSI SAN. iSCSI is making a lot of grounds as a legitatmate SAN solution but it appears both vSphere 4 and Windows 2008 are still on their infancy with fully supporting iSCSI like a FC SAN.
Below is a collection of notes that I have put together regarding running a MSCS cluster on vSphere 4 Update 1. As best I can tell, it is impossible to run a fully supported MSCS cluster on a iSCSI SAN while booting from the iSCSI SAN. Disappointing
So what to do? I will keep updating this post with any additional information that is confirmed or corrected as I continue to research this.
Well here it all is...
vSphere and Microsoft Clustering
Issues:
Here are the supported clustering configurations in VMware:
• The OS or System volume must be located on the ESX host’s local physical disks (which defeats HA and DR) or it is supported on a FC SAN. The individual cluster node’s OS (vmdk) is not supported on an iSCSI LUN. This appears to be the single largest issue that, with iSCSI, we cannot overcome and be in a supported configuration.
• VMware HA is supported. DRS and vMotion are not supported. You can run a clustered node in a DRS cluster, but DRS must be disabled for all clustered nodes.
- This is a by-product of the requirement for independent disks.
• Shared Quorum drives and Shared clustered drives are supported on FC SAN’s only.
• Round Robin MPIO for multi-pathing is not supported (We are using round robin MPIO).
• I believe snapshots are not supported.
- This is a by-product of the requirement for independent disks.
• Thin provisioned vmdk’s are not supported
- I believe this is on shared storage volumes only
• Memory over-commit is not recommended with SWAP stored on SAN.
- I believe the SWAP needs to be located on the local server disks if you are using memory over-commit and your OS vmdk is located on a SAN volume.
- This may have changed in vSphere 4.0 Update 1
This basically means the following regarding running Exchange or SQL in a cluster on VMware:
• Unless we use FC SAN, we cannot place a clustered node on the SAN thus we cannot use HA, DRS or vMotion while being in a supported configuration. This defeats the value of virtualizing our clusters for DR purposes.
• Lack of snapshots and round robin IO would also put substantial limits on performance, backups and maintainability.
From what I have read, we should be able to build clusters on vSphere that work with all of the following:
• iSCSI
• vMotion
• HA
• DRS
• Maybe Snapshots
• Maybe Round Robin MPIO
• Shared storage may be slightly challenging for SQL based clusters. I have read mixed reviews of this on iSCSI/NFS storage.
Alternatives and unclear options:
• Windows 2008 supports a MNS cluster (Majority Node Set with a File Share Witness). In this solution, you do not need to use shared storage between clustered nodes. This should get around the FC requirement, but I cannot find any documentation that says MNS with a File Share Witness is supported or not supported in specific VMware environments.
• For Exchange CCR, with a MNS cluster, there should be no need for any shared storage between nodes, quorum or data.
o But, I have read differing opinions on what the effect of vMotion and iSCSI based storage can be in this type of environment. Basically as I best understand it, any SAN based virtual machine can be paused by VMware if there is congestion on the SAN. VMware will pause the VM’s temporarily until the SAN catches up. From what I have read, during these short pauses, a non-FC clustered node may act like it has gone offline and it may initiate a node failover. The same may be true for vMotion, which I understand why it is not officially supported since it may inadvertently cause a node to appear to have gone offline.
o Now from other reading there are people who have numerous clusters running with vMotion and no problems. But vMotion is definitely not supported.
• Additionally, iSCSI based storage is supported by Microsoft in a clustered environment. This is if you are using Microsofts software iSCSI initiator in the OS. Also the Microsoft iSCSI initiator is supported in VMware with HA, DRS and vMotion. So this might be another solution.
• In the SQL cluster world, shared storage is required unlike the Exchange CCR which does not require shared storage. This may be an instance where we need to use the Microsoft iSCSI initiator for the shared storage if we want to be close to a supported configuration.
With all of that being stated, I think the most supported environment we can get is the following:
• Windows 2008, MNS Cluster for Exchange CCR, no shared storage, no vMotion, HA, no DRS, VM’s might be able to live on iSCSI, no snapshots.
• For SQL, Windows 2008, MNS Cluster, shared storage using MS iSCSI Initiator, no vMotion, HA, no DRS, might be able to live on iSCSI, no snapshots.
References:
Setup for Failover Clustering and Microsoft Cluster Service (vSphere 4.0 Update 1) – VMware publication
http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_mscs.pdf
Micorosfot FAQ on iSCSI support for clusters with Windows 2003
http://www.microsoft.com/windowsserver2003/technologies/storage/iscsi/iscsicluster.mspx
A thread on how to setup a VM cluster that can be vmotioned (unsupported by VMware)
http://communities.vmware.com/message/625404#625404
Discussion threads and blogs:
pvSCSI driver appears to be incompatable with the Windows 2008 Failover cluster validation tool.[http://communities.vmware.com/message/1490569#1490569|http://communities.vmware.com/message/1490569#1490569]
Reference to shared storage being supported only on FC or via software iSCSI initiator
http://communities.vmware.com/message/1285706#1285706
Issues with memory over commitment, SWAP stored on SAN and possible multi-pathing. Also a discussion on the negative issues with the VMware requirement for putting the VM’s on local storage.
http://communities.vmware.com/message/638956#638956
Also discussed in this thread, speculation as to why VMware does not support SAN based VM clusters.
http://communities.vmware.com/message/639452#639452
My personal thoughts about this thread, if you are using a File Share Witness, this should alleviate the issues of SAN Connectivity, thus introducing more stability into a SAN based VM cluster.
Additionally this thread reinforces that you should not put a Quorum drive on a unsupported storage system. Software iSCSI “should work” for a quorum drive as well as using a File Share Witness.
Blog about why vmotion and DRS do not work with a VM clustered node, but HA does work:
http://www.rtfm-ed.co.uk/2007/05/04/vmmscs-clusteringvmware-ha/
There is another thread that discusses changing the shared storage controller to virtual (unsupported by VMware) in order to overcome the vMotion and DRS VM configuration issues.
Blog on how the File Share Witness works and why it is a bad idea to locate the File Share Witness on a Hub Transport server.
To sum this up, if we are using a File Share Witness in a VM cluster, the File Share Witness needs to be very reliable and needs to be online throughout any Exchange patching maintenance. We most likely would want to update the policy for the File Share Witness to be more aggressive than the 1 hour default setting.
This is a MSKB on placing volume mount points on clustered shared disks.
http://support.microsoft.com/kb/947021
A thread on a partially working MNS cluster on VMware with some pros and cons.
http://communities.vmware.com/thread/73508
Some good articles on configuring and managing the File Share Witness in an Exchange CCR cluster and how to manage and move the FSW and how to manage and failover the CCR.
http://technet.microsoft.com/en-us/library/bb676490(EXCHG.80).aspx
Interesting blog on Exchange 2010, VMware and DAG Groups
http://kennethvanditmarsch.wordpress.com/2009/11/20/vmotion-and-exchange-2010/
Message was edited by: kghammond2009