I have 70 ESX hosts in vCenter on several different continents running ESX 3.0.1 to 4.1 with patches. I reciently edited the vpxd.cfg file and added:
<heartbeat>
<notRespondingTimeout>60</notRespondingTimeout>
</heartbeat>
I thought it would eliminate those false positives which also alert many people, some of whom get very worried when they see an alert like this. How can I stop my hosts for causing this sort of quick disconnect / connect? The change I made to vCenter doesn't seem to be working. I'm seeing alerms over night that show a disconnect and a reconnect within the very same minute. It happens to at least 3 hosts per day. All of these false alarms are causing my department to ignore potentially valid alerts. After I made the change to the cfg file restarted the entire server so I know it's in effect.
Heres the entire file... Let me know if there is a problem or what I can do. Thanks!
<config>
<level id="VmCheck">
<logLevel>info</logLevel>
<logName>VmCheck</logName>
</level>
<level id="CpuFeatures">
<logLevel>info</logLevel>
<logName>CpuFeatures</logName>
</level>
<log>
<maxFileNum>10</maxFileNum>
<level>info</level>
<memoryLevel>verbose</memoryLevel>
<compressOnRoll>true</compressOnRoll>
</log>
<alert>
<log>
<enabled>true</enabled>
</log>
</alert>
<vmacore>
<threadPool>
<TaskMax>90</TaskMax>
</threadPool>
<ssl>
<useCompression>true</useCompression>
</ssl>
</vmacore>
<vpxd>
<das>
<serializeadds>true</serializeadds>
<slotCpuMinMHz>256</slotCpuMinMHz>
<slotMemMinMB>0</slotMemMinMB>
</das>
<filterOverheadLimitIssues>true</filterOverheadLimitIssues>
<heartbeat>
<notRespondingTimeout>60</notRespondingTimeout>
</heartbeat>
</vpxd>
</config>