Event ID's 1122 and 1123

Discussion:

(too old to reply)

Jamie

2005-10-12 10:43:02 UTC

Our client has a 4 node cluster running on W2003 Ent. Events 1122 and 1123
appear in the event log every few seconds. Everyting appears to work fine but
worried about these errors.
An engineer went to site to troubleshoot and his recordings are as follows:

Myself and Simon went in on Friday to dissolve teaming on the cluster nics
as recommended by HP. This had no effect on the errors being generated by
the cluster, so we decided to test a few scenarios to see if we could
pinpoint the error. After trying various configurations, we discovered that
we could eliminate the errors on the cluster by moving all of the nics (LAN,
Heartbeat and Backup LAN) off the Jewson production network and onto an
isolated switch. We reconfigured teaming on the LAN nics and we still had no
errors while the cluster was on a switch of its own – this is how we left it.

We found during testing on the production network that the errors being
generated by the cluster could be stopped by restarting the cluster service
on the first node that was booted. After this, no matter which nodes were
taking online/offline, no errors were generated. The moment that all nodes
were physically shutdown and brought back up though, the errors would return
until restarting the cluster service as described. This leads us to believe
that the issue may well be related to DNS or Active Directory. It would
appear that Cluster on Windows 2003 Server attempts to write information into
the AD when the first node boots up. If this fails completely (dnsapi errors
written into event viewer), as is the case on the isolated switch, then
cluster works perfectly. In the production environment (no dnsapi errors
logged), it would appear that the process is partially working, which is then
causing the errors to be generated.
As you know, Jewson are owned by Saint Gobain. We have had experience of
Saint Gobain networks before and they are usually considerably locked down.
This may be a potential issue when the cluster is looking for and attempting
to update domain controllers.
The next step is really to pass this information onto Microsoft to see if
they can tell you what the first node in the cluster attempts to do with
AD/DNS and why the errors would not be generated on the isolated switch.
Until we resolve the errors on the cluster, we do not recommend installing
SQL.

Hope someone can help!
Regards

John Toner [MVP]

2005-10-12 13:38:36 UTC

Permalink

These errors are usually causes by NIC drivers or faulty network components
(swtich, NIC, etc). Make sure that all NICs are updated with the
latest/greatest drivers...especially if your using Broadcom NICs.

Additional advice can be found in the following KB article:

http://support.microsoft.com/kb/892422

Regards,
John

Post by Jamie
Our client has a 4 node cluster running on W2003 Ent. Events 1122 and 1123
appear in the event log every few seconds. Everyting appears to work fine but
worried about these errors.
Myself and Simon went in on Friday to dissolve teaming on the cluster nics
as recommended by HP. This had no effect on the errors being generated by
the cluster, so we decided to test a few scenarios to see if we could
pinpoint the error. After trying various configurations, we discovered that
we could eliminate the errors on the cluster by moving all of the nics (LAN,
Heartbeat and Backup LAN) off the Jewson production network and onto an
isolated switch. We reconfigured teaming on the LAN nics and we still had no
errors while the cluster was on a switch of its own - this is how we left

it.

Post by Jamie
We found during testing on the production network that the errors being
generated by the cluster could be stopped by restarting the cluster service
on the first node that was booted. After this, no matter which nodes were
taking online/offline, no errors were generated. The moment that all nodes
were physically shutdown and brought back up though, the errors would return
until restarting the cluster service as described. This leads us to believe
that the issue may well be related to DNS or Active Directory. It would
appear that Cluster on Windows 2003 Server attempts to write information into
the AD when the first node boots up. If this fails completely (dnsapi errors
written into event viewer), as is the case on the isolated switch, then
cluster works perfectly. In the production environment (no dnsapi errors
logged), it would appear that the process is partially working, which is then
causing the errors to be generated.
As you know, Jewson are owned by Saint Gobain. We have had experience of
Saint Gobain networks before and they are usually considerably locked down.
This may be a potential issue when the cluster is looking for and attempting
to update domain controllers.
The next step is really to pass this information onto Microsoft to see if
they can tell you what the first node in the cluster attempts to do with
AD/DNS and why the errors would not be generated on the isolated switch.
Until we resolve the errors on the cluster, we do not recommend installing
SQL.
Hope someone can help!
Regards

MarkFox

2005-10-13 17:55:13 UTC

Permalink

We had the same issue on our Win2k3 2 node clusters after we added a third
node to the cluster. Per Microsoft, when you add a third node to a cluster
multicast is enabled as a "feature" but is not needed. Our MS Tech
recomended running the following from a command prompt; if you run C:\cluster
yourclustername /priv you will see MulticastCluster is enabled. You can
disable it by running;
C:\cluster yourclustername /priv MulticastClusterDisabled=1:DWORD
this only needs to be run once per cluster name.
Note that all nodes in the cluster will need to be rebooted. This resolved
the 1122/23 issue on one cluster and on one it did not.
Hope that helps.
--
Mark

Post by John Toner [MVP]
These errors are usually causes by NIC drivers or faulty network components
(swtich, NIC, etc). Make sure that all NICs are updated with the
latest/greatest drivers...especially if your using Broadcom NICs.
http://support.microsoft.com/kb/892422
Regards,
John

Post by Jamie
Our client has a 4 node cluster running on W2003 Ent. Events 1122 and 1123
appear in the event log every few seconds. Everyting appears to work fine

but

Post by Jamie
worried about these errors.
An engineer went to site to troubleshoot and his recordings are as
Myself and Simon went in on Friday to dissolve teaming on the cluster nics
as recommended by HP. This had no effect on the errors being generated by
the cluster, so we decided to test a few scenarios to see if we could
pinpoint the error. After trying various configurations, we discovered

that

Post by Jamie
we could eliminate the errors on the cluster by moving all of the nics

(LAN,

Post by Jamie
Heartbeat and Backup LAN) off the Jewson production network and onto an
isolated switch. We reconfigured teaming on the LAN nics and we still had

Post by Jamie
errors while the cluster was on a switch of its own - this is how we left

it.

Post by Jamie
We found during testing on the production network that the errors being
generated by the cluster could be stopped by restarting the cluster

service

Post by Jamie
on the first node that was booted. After this, no matter which nodes were
taking online/offline, no errors were generated. The moment that all

nodes

Post by Jamie
were physically shutdown and brought back up though, the errors would

return

Post by Jamie
until restarting the cluster service as described. This leads us to

believe

Post by Jamie
that the issue may well be related to DNS or Active Directory. It would
appear that Cluster on Windows 2003 Server attempts to write information

into

Post by Jamie
the AD when the first node boots up. If this fails completely (dnsapi

errors

Post by Jamie
written into event viewer), as is the case on the isolated switch, then
cluster works perfectly. In the production environment (no dnsapi errors
logged), it would appear that the process is partially working, which is

then

Post by Jamie
causing the errors to be generated.
As you know, Jewson are owned by Saint Gobain. We have had experience of
Saint Gobain networks before and they are usually considerably locked

down.

Post by Jamie
This may be a potential issue when the cluster is looking for and

attempting

Post by Jamie
to update domain controllers.
The next step is really to pass this information onto Microsoft to see if
they can tell you what the first node in the cluster attempts to do with
AD/DNS and why the errors would not be generated on the isolated switch.
Until we resolve the errors on the cluster, we do not recommend installing
SQL.
Hope someone can help!
Regards