Hi,
We have a machine reporting CPU exceptions (a list of recent exceptions is
attached at the end of the message). What do these exceptions mean?
As far as I can tell, they have caused the machine to crash at least once.
After the crash, Compaq replaced the motherboard unit (which has everything
except the main memory and PCI cards), however the CPU exceptions persist.
Is the main memory faulty?
Type of machine: Digital Personal WorkStation 600au
OS: Digital Unix 4.0D PK3
Firmware revision: 7.0-10
Memory: 4 * 256Mb (total 1Gb)
Disks: 3 channel SWXCR RAID controller + internal SCSI bus
Scott
--------
Near the time of the crash the exceptions were happening more often,
perhaps ten or twenty that day, but now they are occuring once every few
days.
dia reports the exceptions like this: (the values in the Entry body field
change sometimes)
******************************** ENTRY    1 ******************************** 
Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number             2. 
Timestamp of occurrence              16-SEP-1999 05:17:26   
Host name                            bluejay 
System type register      x0000001E  Systype 30. (Miata) 
Number of CPUs (mpnum)    x00000001 
CPU logging event (mperr) x00000000 
Event validity                    1. O/S claims event is valid 
Event severity                    1. Severe Priority 
Entry type                      100. CPU Machine Check Errors 
CPU Minor class                   3. Processor Correctable Error (630) 
Entry Body Size:          x00000068 
Entry body: 
          15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order 
 0000:    00000038  00000018  80000000  00000068   *h...........8...* 
 0010:    FFFFFF00  33C8CF4F  00000000  00000086   *........O..3....* 
 0020:    FFFFFFF0  C5FFFFFF  00000000  00001A00   *................* 
 0030:    00000000  00000000  00000001  00000000   *................* 
 0040:    00000000  00000000  00000000  00000000   *................* 
 0050:    00000000  00000000  00000000  00000000   *................* 
 0060:                        5E3C7E25  00000000   *        ....%~<^* 
At the time of the crash:
******************************** ENTRY   27 ******************************** 
Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            14. 
Timestamp of occurrence              02-SEP-1999 18:33:33   
Host name                            bluejay 
System type register      x0000001E  Systype 30. (Miata) 
Number of CPUs (mpnum)    x00000001 
CPU logging event (mperr) x00000000 
Event validity                    1. O/S claims event is valid 
Event severity                    1. Severe Priority 
Entry type                      302. ASCII Panic Message Type 
SWI Minor class                   9. ASCII Message 
SWI Minor sub class               1. Panic 
ASCII Message                        panic (cpu 0): Processor Machine Check 
                                       
******************************** ENTRY   28 ******************************** 
Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            13. 
Timestamp of occurrence              02-SEP-1999 18:33:33   
Host name                            bluejay 
System type register      x0000001E  Systype 30. (Miata) 
Number of CPUs (mpnum)    x00000001 
CPU logging event (mperr) x00000000 
Event validity                    1. O/S claims event is valid 
Event severity                    1. Severe Priority 
Entry type                      100. CPU Machine Check Errors 
CPU Minor class                   1. Processor Uncorrectable Error (670) 
Entry Body Size:          x00000208 
Entry body: 
          15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order 
 0000:    000001A0  00000118  00000000  000002C0   *................* 
 0010:    00000000  00000000  00000000  00000098   *................* 
 0020:    00000000  00000000  00000000  00000000   *................* 
 0030:    00000000  00000000  00000000  00000000   *................* 
 0040:    00000000  00000000  00000000  00000000   *................* 
 0050:    FFFFFFFF  A85C4000  00000000  00000000   *........._at_\.....* 
 0060:    FFFFFC00  003FA9D0  00000000  000002B8   *..........?.....* 
 0070:    00000000  00000400  00000000  00005200   *.R..............* 
 0080:    00000000  00000000  FFFFFFFF  A85C7838   *8x\.............* 
 0090:    1F1E1615  14020100  FFFFFC00  003FA2F0   *..?.............* 
 00A0:    FFFFFC00  003F9818  FFFFFC00  003FA710   *..?.......?.....* 
 00B0:    FFFFFC00  003FA940  FFFFFC00  003FA570   *p.?....._at_.?.....* 
 00C0:    00000000  00F00270  FFFFFFFF  FFF8DA00   *........p.......* 
 00D0:    00000098  06700009  00000000  00F0380C   *.8........p.....* 
 00E0:    00000000  11FFD980  00000000  00000000   *................* 
 00F0:    00000000  39018000  FFFFFFFF  A85C75D0   *.u\........9....* 
 0100:    FFFFFC00  00561FE0  FFFFFC00  003FA970   *p.?.......V.....* 
 0110:    FFFFFC00  003F9818  00000000  05C3BA38   *8.........?.....* 
 0120:    00000000  00000000  00000000  00000000   *................* 
 0130:    00000000  00000000  00000000  00018000   *................* 
 0140:    00000000  00000000  00000041  62020000   *...bA...........* 
 0150:    FFFFFFFF  FF8000A0  00000000  00000000   *................* 
 0160:    FFFFFF00  0001D04F  00000000  00014890   *.H......O.......* 
 0170:    FFFFFF80  2D8D6FFF  00000000  00000000   *.........o.-....* 
 0180:    00000000  00000C00  FFFFFF00  1961227F   *."a.............* 
 0190:    FFFFFF00  1961227F  FFFFFFF9  45FFFFFF   *...E....."a.....* 
 01A0:    00000000  00400000  00000000  00000000   *.........._at_.....* 
 01B0:    00000000  00000000  00000000  00000000   *................* 
 01C0:    00000000  020C0000  00000000  00000B93   *................* 
 01D0:    00000000  58910000  00000000  0001D540   *_at_..........X....* 
 01E0:    00000000  00008240  00000000  02010002   *........_at_.......* 
 01F0:    00000000  00008240  00000000  00000000   *........_at_.......* 
 0200:                        5E3C7E25  00000000   *        ....%~<^* 
uerf reports much less information, typically something like:
********************************* ENTRY     1. *********************************
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  100.     CPU EXCEPTION 
SEQUENCE NUMBER                  2.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Thu Sep 16 05:17:26 1999
OCCURRED ON SYSTEM                      bluejay 
SYSTEM ID                 x0007001E
SYSTYPE                   x00000000
----- UNIT INFORMATION -----
UNIT CLASS                              CPU 
Received on Thu Sep 16 1999 - 02:44:34 NZST