OK, I have a flakey iSCSI Storage path/device. That's not relly the part that bothers me, rather that when I loose the path/device I loose the entire ESX and all of the VMs which are running on it!
What's worse, although I can not connect with the vSphere client, I can connect via SSH or iLO (the COS) and although HA is configured, it does not recognize a host failure and therefore no VMs restart. I have chosen not to enable Virtual Machine Monitoring for other reasons, although I understand it might resolve the situation with the VMs themselves.
I am concerned mostly with the fact that loosing a storage path/device seems to disable the entire ESX affected. Here's the vmkernel log for the afected ESX.
Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu2:4199)NMP: nmp_PathDetermineFailure: SCSI cmd RESERVE failed on path vmhba32:C0:T2:L0, reservation state on device naa.5000144f25649955 is unknown.
Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu2:4199)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.5000144f25649955" state in doubt; requested fast path state update...
Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu2:4199)ScsiDeviceIO: 1672: Command 0x16 to device "naa.5000144f25649955" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu3:4113)WARNING: FS3: 7096: Reservation error: IO was aborted
Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu2:4107)FS3: 7412: Waiting for timed-out heartbeat [HB state abcdef02 offset 4161536 gen 599 stamp 922837575795 uuid 4d698103-23122782-17dd-001a4bd16da2 jrnl <FB 153421> drv 8.46]
Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu1:4115)FS3: 7044: Starting HB reclaim for [HB state abcdef02 offset 4161536 gen 599 stamp 922837575795 uuid 4d698103-23122782-17dd-001a4bd16da2 jrnl <FB 153421> drv 8.46]
Mar 9 09:20:56 control13 vmkernel: 10:17:42:01.157 cpu2:4322)FS3: 7412: Waiting for timed-out heartbeat [HB state abcdef02 offset 4161536 gen 599 stamp 922837575795 uuid 4d698103-23122782-17dd-001a4bd16da2 jrnl <FB 153421> drv 8.46]
-The Invisible Admin-