Hi all,
I have an issue thats been bothering me and I am hoping someone here can guide me in the right direction. Im currently waiting for vmware support to get back to me but thought i would hop on here and see if people have some ideas.
Im seeing allot of failures of storage paths in general but when trying to failover a MSCS resources from one node to another(which ends up failing or taking a very very long time) i see the same type of path failures just an excessive amount. below is a snippet from the kernel logs. : (all these point to the RDM LUNs for the MSCS disk resources)
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.296 cpu5:4358)NMP: nmp_CompleteCommandForPath: Command 0x0 (0x410001028a40) to NMP device "naa.60060160f192280058a7f865710de011" failed on physical path "vmhba1:C0:T1:L96" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.296 cpu5:4358)ScsiDeviceIO: 747: Command 0x0 to device "naa.60060160f192280058a7f865710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.299 cpu5:4332)NMP: nmp_CompleteCommandForPath: Command 0x0 (0x4100010954c0) to NMP device "naa.60060160f192280082c5674f710de011" failed on physical path "vmhba1:C0:T1:L90" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.299 cpu5:4332)ScsiDeviceIO: 747: Command 0x0 to device "naa.60060160f192280082c5674f710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.300 cpu5:4332)NMP: nmp_CompleteCommandForPath: Command 0x0 (0x410001169e00) to NMP device "naa.60060160f192280083c5674f710de011" failed on physical path "vmhba1:C0:T1:L91" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.300 cpu5:4332)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160f192280083c5674f710de011" state in doubt; requested fast path state update...
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.300 cpu5:4332)ScsiDeviceIO: 747: Command 0x0 to device "naa.60060160f192280083c5674f710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.303 cpu5:4970)NMP: nmp_CompleteCommandForPath: Command 0x0 (0x4100011a07c0) to NMP device "naa.60060160f1922800f64c4756710de011" failed on physical path "vmhba1:C0:T1:L92" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.303 cpu5:4970)ScsiDeviceIO: 747: Command 0x0 to device "naa.60060160f1922800f64c4756710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.306 cpu5:4358)NMP: nmp_CompleteCommandForPath: Command 0x0 (0x410001166000) to NMP device "naa.60060160f1922800f74c4756710de011" failed on physical path "vmhba1:C0:T1:L93" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
Jan 21 11:27:14 <ESX HOST NAME> vmkernel: 45:19:33:02.306 cpu5:4358)ScsiDeviceIO: 747: Command 0x0 to device "naa.60060160f1922800f74c4756710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.197 cpu5:4431)NMP: nmp_CompleteCommandForPath: Command 0x3c (0x4100010a8480) to NMP device "naa.60060160f192280082c5674f710de011" failed on physical path "vmhba1:C0:T1:L90" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.197 cpu5:4431)ScsiDeviceIO: 747: Command 0x3c to device "naa.60060160f192280082c5674f710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.198 cpu5:64257)NMP: nmp_CompleteCommandForPath: Command 0x3c (0x4100010923c0) to NMP device "naa.60060160f192280082c5674f710de011" failed on physical path "vmhba1:C0:T1:L90" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.198 cpu5:64257)ScsiDeviceIO: 747: Command 0x3c to device "naa.60060160f192280082c5674f710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.264 cpu5:29452)NMP: nmp_CompleteCommandForPath: Command 0x1a (0x410001124280) to NMP device "naa.60060160f192280058a7f865710de011" failed on physical path "vmhba1:C0:T1:L96" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.264 cpu5:29452)ScsiDeviceIO: 747: Command 0x1a to device "naa.60060160f192280058a7f865710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.277 cpu5:4359)NMP: nmp_CompleteCommandForPath: Command 0x1a (0x410001092dc0) to NMP device "naa.60060160f192280082c5674f710de011" failed on physical path "vmhba1:C0:T1:L90" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.277 cpu5:4359)ScsiDeviceIO: 747: Command 0x1a to device "naa.60060160f192280082c5674f710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.289 cpu5:5882)NMP: nmp_CompleteCommandForPath: Command 0x1a (0x4100011a24c0) to NMP device "naa.60060160f192280083c5674f710de011" failed on physical path "vmhba1:C0:T1:L91" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.289 cpu5:5882)ScsiDeviceIO: 747: Command 0x1a to device "naa.60060160f192280083c5674f710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.294 cpu5:14017)NMP: nmp_CompleteCommandForPath: Command 0x1a (0x410001272100) to NMP device "naa.60060160f1922800f64c4756710de011" failed on physical path "vmhba1:C0:T1:L92" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44<ESX HOST NAME> vmkernel: 45:19:33:32.294 cpu5:14017)ScsiDeviceIO: 747: Command 0x1a to device "naa.60060160f1922800f64c4756710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.298 cpu5:29453)NMP: nmp_CompleteCommandForPath: Command 0x1a (0x410001120040) to NMP device "naa.60060160f1922800f74c4756710de011" failed on physical path "vmhba1:C0:T1:L93" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.298 cpu5:29453)ScsiDeviceIO: 747: Command 0x1a to device "naa.60060160f1922800f74c4756710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.314 cpu2:64251)NMP: nmp_CompleteCommandForPath: Command 0x1a (0x410001271200) to NMP device "naa.60060160f1922800f84c4756710de011" failed on physical path "vmhba2:C0:T1:L94" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.314 cpu2:64251)ScsiDeviceIO: 747: Command 0x1a to device "naa.60060160f1922800f84c4756710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.347 cpu2:71084)NMP: nmp_CompleteCommandForPath: Command 0x1a (0x410001120840) to NMP device "naa.60060160f1922800ab79d55f710de011" failed on physical path "vmhba2:C0:T1:L95" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jan 21 11:27:44 <ESX HOST NAME> vmkernel: 45:19:33:32.347 cpu2:71084)ScsiDeviceIO: 747: Command 0x1a to device "naa.60060160f1922800ab79d55f710de011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Now this is a newish new cluster using esx 4 update 1. Im sure i have a issue at the storage end, when these were built they implemented recover point and a SANTAP device for all the paths to go through, this has caused us issues in the past with dropping storage. has anyone got experiance in using these devices with ESX? I heard vmware dont recommend ESX hosting using recoverpoint?
I might add, when both nodes of the MSCS cluster are on the same host it works and there is no errors and failovers are flawless, only when they are on different hosts i come across this issue. But as mentioned before since the day these were built ive been seening the above path failures riddled through the kernel logs for LUNs that hold the Datastores for the virtual machines, these path failures are on every host too(the only common thing being split across different enclosures is the san paths). the above snippet was when initating a failure of a MSCS node where I see path failures for every RDM in the resource.
should add also virtuals have been setup to the letter for MSCS so its not an issue on that end, and as mentioned before i get these path failures for all LUNs on all hosts but they are more riddled thought the kernel logs like cancer, just where when doing a MSCS resource group failure i get a big block of them.
any ideas? an issue on storage end? if so recoverpoint?
hope i explained that ok.... comes across clear in my head hahaha
Cheers