Quantcast
Channel: VMware Communities : Popular Discussions - VMware ESX 4
Viewing all articles
Browse latest Browse all 36074

Cluster problems

$
0
0

Dear fellow admins.

 

I have issues with my vsphere cluster.

 

First a breif explanation of my setup.

 

4 Fujitsu Siemens RX servers - each with 3 nics.

Ip net for ESX servers, and virtual servers: 192.168.10.x

Ip Net for iSCSI 192.168.11.x and 192.168.12.x

 

Each server have:

2 Nic for iSCSI - ip net 11.x and 12.x

1 nic for Everything else. 10.x net

 

1 48 POrt 1000 mbit switch (zyxel level 3 switch) I am not using vlan at all. All traffic goes to this switch.

 

1 IBM DS3300 iSCSI dual controller.

Controller setup:

COntroller A:

POrt1: 192.168.11.6

Port 2 192.168.12.6

 

Controller B

Port1 192.168.11.7

Port2 192.168.12.7

 

ISCSI is configured with 2 paths to each lun, active / active (io) and 2 inactive paths.

 

Got no internal DNS

Got 1 vmware infrastructure server witch controls the cluster. ip: 192.168.10.37

Snippet from host file from ESX servers:

 

192.168.10.20   ESX1.Hosting ESX1

192.168.10.21   ESX2.Hosting ESX2

192.168.10.22   ESX3.Hosting ESX3

192.168.10.23   ESX4.Hosting ESX4

192.168.10.37   VSS.Hosting VSS

 

 

Allright, i have attached a picture of my network setup on one server (the same on all)

 

Problem 1:

Now my problems:

 

Some of the ESX hosts sometimes gets disconnected from the cluster, and then they reconnect afterwards.

from event log:

Host 192.168.10.21 in datacenter Datacenter is not responding.Then the servs gets disconnected, and then:

 

Alarm 'Virtual machine cpu usage' on SERVER changed

from Green to Gray

info

12-05-2010 10:19:44

 

Alarm 'Host connection failure' on 192.168.10.21 triggered

an action

info

12-05-2010 10:19:44

Alarm 'Host connection failure' on entity 192.168.10.21

send SNMP trap

info

12-05-2010 10:19:44

 

Then normally it reconnects itself again.

 

 

 

Problem 2:

SOmetimes the virtual serves looses connection with the vmware network.

Today one virtual server even got shut down, and and startedd again. But before that happended, i noticed this from looking at "backup server" Tasks and events:

 

192.168.10.21 is disconnected (.21 is a ESX server where Backupserver is located)

And then:

Host is connected

info

12-05-2010 06:40:27

 

info

12-05-2010 05:58:19

This occoured 8 times within 1½ hours. and then the backupserver shut down.

 

Problem 3:

one of the esx servers shows this in the event log:

 

Alarm 'Cannot connect to storage' on entity 192.168.10.20

send SNMP trap

info

12-05-2010 01:43:13

Alarm 'Cannot connect to storage' on 192.168.10.20

changed from Gray to Gray

info

12-05-2010 01:43:13

Alarm 'Cannot connect to storage' on 192.168.10.20

changed from Gray to Gray

info

12-05-2010 01:43:13

 

Lost connectivity to storage device

naa.600a0b80005aedbb00000ac54b17dfff. Path

vmhba33:C7:T0:L3 is down. Affected datastores: "IBM

LUN3".

error

12-05-2010 01:41:03

 

Lost connectivity to storage device

naa.600a0b80005aedbb00000f7e4b656403. Path vmhba33:

C7:T0:L5 is down. Affected datastores: "IBM LUN5".

error

12-05-2010 01:41:03

 

Lost access to volume

4b2a36fb-986d63ce-b6ea-000ae48a8ba7 (IBM LUN2) due

to connectivity issues. Recovery attempt is in progress and

outcome will be reported shortly.

info

12-05-2010 01:41:04

Successfully restored access to volume 4b2a36fb-986d63ce

-b6ea-000ae48a8ba7 (IBM LUN2) following connectivity

issues.

info

12-05-2010 01:42:03

 

 

 

and so on

 

 

 

Please help me guys, i've read a lot of docu on this before posting here, but frankly, i no longer know what to do. This is serius problems.

 

Maybe installing a second nic in each server, and dedicating that to vm traffic, so i have service console traffic seperated would be a good idea?


Viewing all articles
Browse latest Browse all 36074

Trending Articles