vSphere version 4.0.0.261974
vCenter version 4.0.0.258672
I have been installing physical memory upgrades on our vSphere servers but have a problem with a particular vSphere host. Before placing in maintenance mode, I migrate VM's manually but the first 2 VM's got stuck at 10%. Restarted the VMware VirtualCenter Server service but shortly after this all the VM's on the paticular vSphere node automatically attempted to migrate to the other vSphere nodes. The same VM became stuck at 10% and I was able to cancel the other VM migrations by right clicking on them and selecting cancel. I believe the problem with the VM maybe that there is a CD or serial port that is preventing the migration that I should have checked beforehand.
After allowing plenty of time to migrate I again restarted the VMware VirtualCenter Server service. The next time that I started the vSphere client the vSphere node had a greyed out status and was showing as disconnected. I tried to connect but the 'Connect' task was stuck in progress.
I can connect to the vSphere node via ssh so there is no issue with connectivity and there are still 9 VM's controlled by the node in a live environment so cannot simply restart the vSphere server.
After a little research I followed the advice found here (http://vmwise.com/2011/03/21/when-a-reconnect-is-not-enough/) of attaching to the vSphere node via ssh and doing the following:
- Stop the vCenter agent process. –>service vmware-vpxa stop
- Stop the hostd process –> service mgmt-vmware stop
- Delete the user account vCenter uses to communicate with the host –> userdel vpxuser
- Find all of the vCenter and HA processes. –> rpm -qa |grep -iE ‘vpx|aam’
- Delete the RPM’s found earlier –> rpm -e
- Start the hostd process. –> service mgmt-vware start
- Go back into vCenter and reconnect the host.
After this I could get a little further during the connect process and it would complain of a bad username and password. When I enter the correct credentials it instantly returns a security alert window stating that it is 'Unable to verify the authenticity of the specified host.....do you wish to proceed with connecting anyway?'. If I select 'Yes' nothing happens for a few minutes and I eventually receive a 'Request timed out error'.
Not quite sure what to do next. Has anyone else experienced this previously and is there anything else that I can try? Any help much appreciated.