We have ESX4 on HP C7000 enclosure and EVA storage in our environment.
3 LUNS were inaccessible due to dead paths which caused reset virtual machine instance stuck at 95% and created issues with LUN itself and VMs hosted on the same LUN’s and 5 ESX host in a cluster were impacted and got disconnected from the virtual centre server. when I tried to browse the DATASTORE it came up terribly slow and showed no data. I did following steps to resolve this issue.
1. | Tried restarting VM ( No results ) |
2. | Tried to reset VM which got stuck at 95%.( Logically VMX file was not accessible to the ESX host and hostd got panic ) |
3. | Used Vshpere CLI to Kill VM Process. ( No VM instance is running since VMX registration failed ) |
4. | Tried to rescan for new datastores on ESX host to clean up dead paths which failed because LUN was actively refusing connection and this was identified at step 4. ( It failed too since 3 LUNs have dead paths) |
5. | VMkernel logs indicated that 3 LUNs are unresponsive and none of the ESX hosts are able to connect to it. |
6. | Un-presented the LUN from all hosts with success. |
7. | Rescanned for datastores on all hosts to clean up dead path with success. |
8. | Represented datastores to all hosts |
9. | Rescanned for datastores on all hosts with success to discover the new presented LUN’s |
10. | Tried to power on VM with no luck since ESX hosts lost communication with VM due to storage being unavailable. |
11. | Restarted management and VPXA services. |
12. | Used VMware CLI to Unregistered virtual machines from ESX and registered again. |
13. | Used VMware CLI to power on the VM with success. |
Question is- How many possible ways and workaround to get rid of dead paths ?
What exactly has caused it ?