We are running a PC-164 based alpha with DU4.0B as the operating system and
have been having many many memory error messages pop up in the dxconsole.
Other than these messages, the machine seems to be running fine and in fact
is still quite speedy (relatively speaking) computationally. I analyzed the
error messages using the DEC_EVENT dia -R ... command and found many
entries such as:
===========================================================================
============
DECevent V2.3
******************************** ENTRY    1
********************************
Logging OS                        2. Digital UNIX
System Architecture               2. Alpha
Event sequence number           351.
Timestamp of occurrence              16-JUL-1999 08:54:50
Host name                            forge
System type register      x0000001A  EB164 or AlphaPC164
Number of CPUs (mpnum)    x00000001
CPU logging event (mperr) x00000000
Event validity                    1. O/S claims event is valid
Event severity                    1. Severe Priority
Entry type                      100. CPU Machine Check Errors
CPU Minor class                   3. Bcache error (630 entry)
Entry Body Size:          x00000068
Entry body:
           15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order
0000:    00000038  00000018  80000000  00000060   *`...........8...*
0010:    FFFFFF00  04F8C45F  00000000  00000086   *........_.......*
0020:    FFFFFFF0  C5FFFFFF  00000000  00009400   *................*
0030:    00000000  00000000  00000001  00000000   *................*
0040:    00000000  00000000  00000000  00000000   *................*
0050:    00000000  00000000  00000000  00000000   *................*
0060:                        5E3C7E25  00000000   *        ....%~<^*
===========================================================================
On the face of it, this seems to this novice sysop to be a hardware error
which for us is quite bad because I doubt that this motherboard is even
availble anymore and a replacement is bound to be expensive. In talking
with one of my unix friends, she suggested that it might simply be a kernel
error passing itself off as a hardware error. Although this seems unlikely,
does anyone have any information or suggestions on how to go about
verifying that it actually is a hardware error? Any suggestions are greatly
appreciated.
Sincerely,
Bruce
bantolovich_at_specialmetals.com
Received on Fri Jul 16 1999 - 15:17:08 NZST