Discussion:
WIN2K3 Cluster question
(too old to reply)
G D
2011-11-22 19:55:19 UTC
Permalink
Hi there;

two node WIN2K3 Cluster:

We have 5 groups, and try to keep it in a active/passive setup. I'm
now seeing one group that is constantly failing over to the "passive"
node. The failover and failback are set to the defaults, just like all
other groups. Anyone know why this might be occurring with just this
one group? (I still find cluster logs difficult to read.)

Also, when this happens, our helpdesk then "loses" visibility to this
network name and freak out, even though the share is still operating
properly for the users.

I thought that to "see" ANY shared resource in the cluster you could
use the cluster name. Can someone please clear this up?

Thanks.

G
Roc B
2012-12-12 15:31:03 UTC
Permalink
Post by G D
Hi there;
We have 5 groups, and try to keep it in a active/passive setup. I'm
now seeing one group that is constantly failing over to the "passive"
node. The failover and failback are set to the defaults, just like all
other groups. Anyone know why this might be occurring with just this
one group? (I still find cluster logs difficult to read.)
Also, when this happens, our helpdesk then "loses" visibility to this
network name and freak out, even though the share is still operating
properly for the users.
I thought that to "see" ANY shared resource in the cluster you could
use the cluster name. Can someone please clear this up?
Thanks.
G
Hi G,

Cluster Network Name is not needed to access a share name, in fact, you should instruct your users to access shares using the network name of the group where the share is created instead, as that's the network name which will be always available after a failover (it is moved at the same time that your share is moved, as it belongs to the same group)

I can give you some hints for cluster.log troubleshooting, i hope you find them interesting.

First of all, you can filter using a pipe to the "find" command or "findstr" if you want several matches. For example, from cmd.exe:

type c:\windows\cluster\cluster.log | find " ERR " > ERR.txt generate the "ERROR" files

you can substitute the " ERR " with " WARN " or " INFO " or whatever you want, to avoid excessive ammount of information.

When you have identified some of this errors, you can look for the Origin of them, remember that cluster.log is used from several monitors or sources, for example, [Qfs] [res] ... you can notice them because they're between "[ ]". You can use the same methodology to list a particular source:

type c:\windows\cluster\cluster.log | find "[RES]" > ResourcesMessages.txt

You can afterwards figure out where the real problem is. Remember that time for cluster.log is written on UTC (So, if you are going to search for a specific timeframe, you need to take care of this time gap between the local time (gmt +3 or whatever) to UTC to match with e.g. Event Viewer.

Hope you find this information handy!!!

Good Luck!

Loading...