Unexpexted Cluster Switch (due error 5 from clusterlog)

Discussion:

(too old to reply)

Matthias

2008-05-16 11:07:00 UTC

Hello all,
yesterday one of our clustersystems do an unexpexted clusterswitch.

Systeminformation:

HW: ProLiant DL585 G1 / 2x AMD Opteron 2,2 GHz / 16 GB RAM
OS: Microsoft Windows Server 2003 Enterprise x64 Edition
OS Version: 5.2.3790 Service Pack 2 Build 3790

HP ProLiant Support Pack 7.90

Atached to a SAN via FC

Main Software: SAP CRM 5.0 SP15 / on MS SQL Server 2005
Support Software: DataProdector / McAfee (Enterp. 8.0.0 Patch 15) / SNARE
3.0.0

MSCS-Configuration:

Userlan (Teaming)
Serverlan ( NO-Team)
PrivatLAN (crossover)

Clustergoup / MSDTC-Group / SAP-Group / SQL-Group

___________________________________________________________-

The Clusterlog:

0000098c.00000a64::2008/05/15-15:16:43.912 INFO [DM] DmpGetSnapShotCb:
DmpGetDatabase returned 0x00000000
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsGetTempFileName
Q:\MSCS\, chkpt, 8011 => Q:\MSCS\chk1F4B.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] DmpGetSnapshotCb:
Checkpoint file name=Q:\MSCS\chk1F4B.tmp Seq#=8011
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsMoveFileEx
Q:\MSCS\chk619F.tmp=>Q:\MSCS\chk1F4B.tmp
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] DmpGetSnapShotCb:
Failed to move the temp file to checkpoint file,
TempFileName=Q:\MSCS\chk619F.tmp, ChkPtFileName=Q:\MSCS\chk1F4B.tmp,
Error=0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
Q:\MSCS\chk619F.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogCheckPoint: Callback
failed to return a checkpoint
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogpReset:: Callback
failed to return a checkpoint, error=5
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Entry
LogFile=0x02ad7df0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogFlush :
pLog=0x02ad7df0 writing the 1024 bytes for active page at offset 0x00000400
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] WriteFile 99c (....)
1024, status 0 (0=>0)
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsFlushBuffers 99c,
status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsCloseHandle 99c,
status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Exit
returning success
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
Q:\MSCS\tqu619E.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogpReset exit,
returning 0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogReset exit,
returning 0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 ERR [DM]DmpCheckpointTimerCb -
Failed to reset log, error=5
0000098c.00000a64::2008/05/15-15:16:44.005 ERR Cluster service suffered an
unexpected fatal error at line 2324 of source module
d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
00000f58.00000f5c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f58.00000f5c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000f38.00000f3c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f38.00000f3c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000f18.00000f1c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f18.00000f1c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000b70.00000b74::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000b70.00000b74::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000b70.00000b74::2008/05/15-15:16:45.004 INFO SAP Resource <SAP CPR 00
Instance>: ResourceControl request.
00000f18.00000f34::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000f38.00000f54::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000f58.00000f74::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000b70.00000f08::2008/05/15-15:16:45.035 INFO [RM] NotifyChanges shutting
down.
00000b70.00000f10::2008/05/15-15:16:45.050 INFO Physical Disk <Disk H:>:
[DiskArb] CompletionRoutine, status 0.
00000b70.00000f10::2008/05/15-15:16:45.050 INFO Physical Disk <Disk H:>:

There are also Errors in the Eventlog:

Event Type: Error
Event Source: ClusSvc
Event Category: Log Mgr
Event ID: 1016
Date: 15.05.2008
Time: 17:16:43
User: N/A
Computer: NODE1
Description:
Cluster service failed to obtain a checkpoint from the server cluster
database for log file Q:\MSCS\tqu619E.tmp.

Next:

Event Type: Error
Event Source: ClusSvc
Event Category: Database Mgr
Event ID: 1000
Date: 15.05.2008
Time: 17:16:43
User: N/A
Computer: NODE1
Description:
Cluster service suffered an unexpected fatal error at line 2324 of source
module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.

alot of:

Event Type: Warning
Event Source: Ftdisk
Event Category: Disk
Event ID: 57
Date: 15.05.2008
Time: 17:16:45
User: N/A
Computer: NODE1
Description:
The system failed to flush data to the transaction log. Corruption may occur.

And:

Event Type: Error
Event Source: Service Control Manager
Event Category: None
Event ID: 7031
Date: 15.05.2008
Time: 17:16:45
User: N/A
Computer: NODE1
Description:
The Cluster Service service terminated unexpectedly. It has done this 1
time(s). The following corrective action will be taken in 60000
milliseconds: Restart the service.

I found the KB http://support.microsoft.com/kb/321531/en-us but I can not
belive that our virusscanner is the reason because we EXCLUDE all recommented
Drives and Files ( e.q Quorumdrive/ Databasedives / DatabasLOG-Drives/
SQL-Executables, Pagefile, C:\Windows\Cluster, ..\NTDS, ..ntfsr, ..SYSVOL,
*.chk, *.ebd, *.ldf, *.log, *.mdf, *.ndf, *.stm) from read and write scan.

Anyone has an idea ?

br, Matthias
____________________________________________
Matthias Schweifer - Austria

Jeff Hughes [MSFT]

2008-05-16 12:27:29 UTC

Permalink

Error 5 is an access denied and it occurred when we were checkpointing the
cluster registry to the quorum drive. Check and make sure the cluster
service account has both the 'backup files and directories' and 'restore
files and directories' user rights. Also, make sure your Antivirus is NOT
scanning the quorum. If it was scanning a quorum file at the time of a
checkpoint, that may explain the error 5.
--
Jeff Hughes, MCSE
Support Escalation Engineer
Microsoft Enterprise Platforms Support (Server Core/Cluster)

Post by Matthias
Hello all,
yesterday one of our clustersystems do an unexpexted clusterswitch.
HW: ProLiant DL585 G1 / 2x AMD Opteron 2,2 GHz / 16 GB RAM
OS: Microsoft Windows Server 2003 Enterprise x64 Edition
OS Version: 5.2.3790 Service Pack 2 Build 3790
HP ProLiant Support Pack 7.90
Atached to a SAN via FC
Main Software: SAP CRM 5.0 SP15 / on MS SQL Server 2005
Support Software: DataProdector / McAfee (Enterp. 8.0.0 Patch 15) / SNARE
3.0.0
Userlan (Teaming)
Serverlan ( NO-Team)
PrivatLAN (crossover)
Clustergoup / MSDTC-Group / SAP-Group / SQL-Group
___________________________________________________________-
DmpGetDatabase returned 0x00000000
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsGetTempFileName
Q:\MSCS\, chkpt, 8011 => Q:\MSCS\chk1F4B.tmp, status 0
Checkpoint file name=Q:\MSCS\chk1F4B.tmp Seq#=8011
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsMoveFileEx
Q:\MSCS\chk619F.tmp=>Q:\MSCS\chk1F4B.tmp
Failed to move the temp file to checkpoint file,
TempFileName=Q:\MSCS\chk619F.tmp, ChkPtFileName=Q:\MSCS\chk1F4B.tmp,
Error=0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
Q:\MSCS\chk619F.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogCheckPoint: Callback
failed to return a checkpoint
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogpReset:: Callback
failed to return a checkpoint, error=5
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Entry
LogFile=0x02ad7df0
pLog=0x02ad7df0 writing the 1024 bytes for active page at offset 0x00000400
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] WriteFile 99c (....)
1024, status 0 (0=>0)
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsFlushBuffers 99c,
status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsCloseHandle 99c,
status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Exit
returning success
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
Q:\MSCS\tqu619E.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogpReset exit,
returning 0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogReset exit,
returning 0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 ERR [DM]DmpCheckpointTimerCb -
Failed to reset log, error=5
0000098c.00000a64::2008/05/15-15:16:44.005 ERR Cluster service suffered an
unexpected fatal error at line 2324 of source module
d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
00000f58.00000f5c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1,
""
00000f58.00000f5c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000f38.00000f3c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1,
""
00000f38.00000f3c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000f18.00000f1c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1,
""
00000f18.00000f1c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000b70.00000b74::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1,
""
00000b70.00000b74::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000b70.00000b74::2008/05/15-15:16:45.004 INFO SAP Resource <SAP CPR 00
Instance>: ResourceControl request.
00000f18.00000f34::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000f38.00000f54::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000f58.00000f74::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000b70.00000f08::2008/05/15-15:16:45.035 INFO [RM] NotifyChanges shutting
down.
[DiskArb] CompletionRoutine, status 0.
Event Type: Error
Event Source: ClusSvc
Event Category: Log Mgr
Event ID: 1016
Date: 15.05.2008
Time: 17:16:43
User: N/A
Computer: NODE1
Cluster service failed to obtain a checkpoint from the server cluster
database for log file Q:\MSCS\tqu619E.tmp.
Event Type: Error
Event Source: ClusSvc
Event Category: Database Mgr
Event ID: 1000
Date: 15.05.2008
Time: 17:16:43
User: N/A
Computer: NODE1
Cluster service suffered an unexpected fatal error at line 2324 of source
module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
Event Type: Warning
Event Source: Ftdisk
Event Category: Disk
Event ID: 57
Date: 15.05.2008
Time: 17:16:45
User: N/A
Computer: NODE1
The system failed to flush data to the transaction log. Corruption may occur.
Event Type: Error
Event Source: Service Control Manager
Event Category: None
Event ID: 7031
Date: 15.05.2008
Time: 17:16:45
User: N/A
Computer: NODE1
The Cluster Service service terminated unexpectedly. It has done this 1
time(s). The following corrective action will be taken in 60000
milliseconds: Restart the service.
I found the KB http://support.microsoft.com/kb/321531/en-us but I can not
belive that our virusscanner is the reason because we EXCLUDE all recommented
Drives and Files ( e.q Quorumdrive/ Databasedives / DatabasLOG-Drives/
SQL-Executables, Pagefile, C:\Windows\Cluster, ..\NTDS, ..ntfsr, ..SYSVOL,
*.chk, *.ebd, *.ldf, *.log, *.mdf, *.ndf, *.stm) from read and write scan.
Anyone has an idea ?
br, Matthias
____________________________________________
Matthias Schweifer - Austria

praveen

2011-04-21 02:27:18 UTC

Permalink

Hi Jeff,

It will be very helpfull if you can provide a solution for one of the issue i am facing with the same Error 5.

I am facing this error in a Majority node cluster which has Exchange 2007 .

Cluster service could not write to a file (C:\DOCUME~1\XXX~1\LOCALS~1\Temp\CLS1348.tmp.

From cluster log,
00000de8.00002fa0::2011/03/17-02:45:19.673 WARN [CP] CppCheckpoint failed to get registry database SYSTEM\CurrentControlSet\Services\MSExchangeIS\ahexclex1 to file C:\DOCUME~1\XXXAHC~1\LOCALS~1\Temp\CLS2D86.tmp error 5

00000de8.00002fa0::2011/03/17-02:45:19.673 WARN [CP] CppRegNotifyThread CppNotifyCheckpoint due to timer failed, reset the timer.

SO basically Error 5 comes for "Access denied" issue. we have Majority node set and I have ecxluded the C:\DOCUME~1\XXXAHC~1\LOCALS~1\Temp c:\Windows\Cluster from Antivirus scanning but still the error persists.

Kindly help to understand the possible cause of the occurence of Error 5 in this case.

Post by Matthias
Hello all,
yesterday one of our clustersystems do an unexpexted clusterswitch.
HW: ProLiant DL585 G1 / 2x AMD Opteron 2,2 GHz / 16 GB RAM
OS: Microsoft Windows Server 2003 Enterprise x64 Edition
OS Version: 5.2.3790 Service Pack 2 Build 3790
HP ProLiant Support Pack 7.90
Atached to a SAN via FC
Main Software: SAP CRM 5.0 SP15 / on MS SQL Server 2005
Support Software: DataProdector / McAfee (Enterp. 8.0.0 Patch 15) / SNARE
3.0.0
Userlan (Teaming)
Serverlan ( NO-Team)
PrivatLAN (crossover)
Clustergoup / MSDTC-Group / SAP-Group / SQL-Group
___________________________________________________________-
DmpGetDatabase returned 0x00000000
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsGetTempFileName
Q:\MSCS\, chkpt, 8011 => Q:\MSCS\chk1F4B.tmp, status 0
Checkpoint file name=Q:\MSCS\chk1F4B.tmp Seq#=8011
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsMoveFileEx
Q:\MSCS\chk619F.tmp=>Q:\MSCS\chk1F4B.tmp
Failed to move the temp file to checkpoint file,
TempFileName=Q:\MSCS\chk619F.tmp, ChkPtFileName=Q:\MSCS\chk1F4B.tmp,
Error=0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
Q:\MSCS\chk619F.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogCheckPoint: Callback
failed to return a checkpoint
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogpReset:: Callback
failed to return a checkpoint, error=5
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Entry
LogFile=0x02ad7df0
pLog=0x02ad7df0 writing the 1024 bytes for active page at offset 0x00000400
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] WriteFile 99c (....)
1024, status 0 (0=>0)
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsFlushBuffers 99c,
status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsCloseHandle 99c,
status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Exit
returning success
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
Q:\MSCS\tqu619E.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogpReset exit,
returning 0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogReset exit,
returning 0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 ERR [DM]DmpCheckpointTimerCb -
Failed to reset log, error=5
0000098c.00000a64::2008/05/15-15:16:44.005 ERR Cluster service suffered an
unexpected fatal error at line 2324 of source module
d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
00000f58.00000f5c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f58.00000f5c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000f38.00000f3c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f38.00000f3c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000f18.00000f1c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f18.00000f1c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000b70.00000b74::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000b70.00000b74::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000b70.00000b74::2008/05/15-15:16:45.004 INFO SAP Resource <SAP CPR 00
Instance>: ResourceControl request.
00000f18.00000f34::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000f38.00000f54::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000f58.00000f74::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000b70.00000f08::2008/05/15-15:16:45.035 INFO [RM] NotifyChanges shutting
down.
[DiskArb] CompletionRoutine, status 0.
Event Type: Error
Event Source: ClusSvc
Event Category: Log Mgr
Event ID: 1016
Date: 15.05.2008
Time: 17:16:43
User: N/A
Computer: NODE1
Cluster service failed to obtain a checkpoint from the server cluster
database for log file Q:\MSCS\tqu619E.tmp.
Event Type: Error
Event Source: ClusSvc
Event Category: Database Mgr
Event ID: 1000
Date: 15.05.2008
Time: 17:16:43
User: N/A
Computer: NODE1
Cluster service suffered an unexpected fatal error at line 2324 of source
module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
Event Type: Warning
Event Source: Ftdisk
Event Category: Disk
Event ID: 57
Date: 15.05.2008
Time: 17:16:45
User: N/A
Computer: NODE1
The system failed to flush data to the transaction log. Corruption may occur.
Event Type: Error
Event Source: Service Control Manager
Event Category: None
Event ID: 7031
Date: 15.05.2008
Time: 17:16:45
User: N/A
Computer: NODE1
The Cluster Service service terminated unexpectedly. It has done this 1
time(s). The following corrective action will be taken in 60000
milliseconds: Restart the service.
I found the KB http://support.microsoft.com/kb/321531/en-us but I can not
belive that our virusscanner is the reason because we EXCLUDE all recommented
Drives and Files ( e.q Quorumdrive/ Databasedives / DatabasLOG-Drives/
SQL-Executables, Pagefile, C:\Windows\Cluster, ..\NTDS, ..ntfsr, ..SYSVOL,
*.chk, *.ebd, *.ldf, *.log, *.mdf, *.ndf, *.stm) from read and write scan.
Anyone has an idea ?
br, Matthias
____________________________________________
Matthias Schweifer - Austria

Post by Jeff Hughes [MSFT]
Error 5 is an access denied and it occurred when we were checkpointing the
cluster registry to the quorum drive. Check and make sure the cluster
service account has both the 'backup files and directories' and 'restore
files and directories' user rights. Also, make sure your Antivirus is NOT
scanning the quorum. If it was scanning a quorum file at the time of a
checkpoint, that may explain the error 5.
--
Jeff Hughes, MCSE
Support Escalation Engineer
Microsoft Enterprise Platforms Support (Server Core/Cluster)

Post by Matthias
I am not the backup-administrator in our company, but as further information
I note that there was a FILE-System FULLBACKUP on both nodes ( with HP
DataProtector) ; also the physikal QuorumDisk was backuped....
Beginn : 17:15
Is that a possible reason for the erro 5 ?
Should we exclude the Quorumdisk from the backupset ?
(Is a Systemstatebackup sufficiently)
br, matthias

Post by Jeff Hughes [MSFT]
Yes, if the quorum files were being backed up at the time, that's very
possible why you got an error 5. You do not need to backup the quorum and it
should be excluded from your scheduled backups. There's nothing there you'd
ever need to recover since all the quorum is used for is maintaining a copy
of the cluster database and any checkpointed registry keys, and you can
always recreate those files if needed.
--
Jeff Hughes, MCSE
Support Escalation Engineer
Microsoft Enterprise Platforms Support (Server Core/Cluster)

Post by unknown
Hello,
i got nearly the same messages as descriped above.
But my error code is 2
Event Type: Error
Event Source: ClusSvc
Event Category: Database Mgr
Event ID: 1000
Date: 06.06.2008
Time: 14:34:44
User: N/A
Computer: SVREHDWHCLN1
Cluster service suffered an unexpected fatal error at line 2236 of source module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 2.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
The system failed to flush data to the transaction log. Corruption may occur.
Cluster service is requesting a bus reset for device \Device\ClusDisk0.
Server specific error code 5086
The cluster fails over properly and is running on the other node.
But the first node died
Any ideas??
I do not want to evict the node, or set up the machine new.
FSC Blade BX630
Win2k3 64 bit
Sql 2005 SP2
IBM SVC San FC Connected
Thanks for your help

Matthias

2008-05-16 12:37:01 UTC

Permalink

I am not the backup-administrator in our company, but as further information
I note that there was a FILE-System FULLBACKUP on both nodes ( with HP
DataProtector) ; also the physikal QuorumDisk was backuped....
Beginn : 17:15

Is that a possible reason for the erro 5 ?
Should we exclude the Quorumdisk from the backupset ?
(Is a Systemstatebackup sufficiently)

br, matthias

Jeff Hughes [MSFT]

2008-05-20 14:35:45 UTC

Permalink

Yes, if the quorum files were being backed up at the time, that's very
possible why you got an error 5. You do not need to backup the quorum and it
should be excluded from your scheduled backups. There's nothing there you'd
ever need to recover since all the quorum is used for is maintaining a copy
of the cluster database and any checkpointed registry keys, and you can
always recreate those files if needed.
--
Jeff Hughes, MCSE
Support Escalation Engineer
Microsoft Enterprise Platforms Support (Server Core/Cluster)

unknown

2008-06-09 16:51:08 UTC

Permalink

Hello,
i got nearly the same messages as descriped above.
But my error code is 2

Event Type: Error
Event Source: ClusSvc
Event Category: Database Mgr
Event ID: 1000
Date: 06.06.2008
Time: 14:34:44
User: N/A
Computer: SVREHDWHCLN1
Description:
Cluster service suffered an unexpected fatal error at line 2236 of source module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 2.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

then i got several messages:

The system failed to flush data to the transaction log. Corruption may occur.

After that only this messages appear:

Cluster service is requesting a bus reset for device \Device\ClusDisk0.

Cluster Service did not start any more:

Server specific error code 5086

The cluster fails over properly and is running on the other node.

But the first node died

Any ideas??
I do not want to evict the node, or set up the machine new.

Config:

FSC Blade BX630
Win2k3 64 bit
Sql 2005 SP2

IBM SVC San FC Connected

Thanks for your help

John Toner [MVP]

2008-06-13 12:33:47 UTC

Permalink

Not enough info here to figure out the problem, but it looks like you might
have lost connectivity to your quorum disk.

Regards,
John

Visit my blog: http://msmvps.com/blogs/jtoner

module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 2.

Post by unknown
For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.

Post by unknown
The system failed to flush data to the transaction log. Corruption may occur.
Cluster service is requesting a bus reset for device \Device\ClusDisk0.
Server specific error code 5086
The cluster fails over properly and is running on the other node.
But the first node died
Any ideas??
I do not want to evict the node, or set up the machine new.
FSC Blade BX630
Win2k3 64 bit
Sql 2005 SP2
IBM SVC San FC Connected
Thanks for your help