Only got one responce from Joe Fletcher telling me to check the
seating of the CPU and memory and consider replacing those parts
if reseating doesn't help. Certainly good advice and I am actually
alreading runnign tests with different memory modules in different
memory slots.
So basically this seem to be a hardware problem and I will
have to locate the broken parts and replace them.
        Dirk
Original message:
> 
> Yesterday a PWS600au I manage paniced and went to the SRM prompt.
> I had gotten what I thought to be memory errors from that machine
> for a while now and was in the process of figuring out what
> memory modules to replace. But it never paniced before and now
> I am not sure anymore if these really are memory errors or
> something worse. I attach the corresponding output from the
> binary errorlog and /var/adm/messages. I would appreciate
> any help deciphering them.
> 
> BTW, I got a few hundred binary errorlog entries like 1387
> within the last few weeks but the machine never paniced.
> They usually happened under high load which made me suspect
> the memory.
> 
> Thanks
> 
>     Dirk Hufnagel
> 
> 
> **** V3.3  ********************* ENTRY 1387 
> ********************************
> 
> 
> Logging OS                        2. Digital UNIX
> System Architecture               2. Alpha
> Event sequence number           317.
> Timestamp of occurrence              25-FEB-2002 17:09:39
> Host name                            hostna
> 
> System type register      x0000001E  Systype 30. (Miata)
> Number of CPUs (mpnum)    x00000001
> CPU logging event (mperr) x00000000
> 
> Event validity                    1. O/S claims event is valid
> Event severity                    1. Severe Priority
> Entry type                      100. Machine Check Error - (major class)
>                                   1.    - (minor class)
> 
> 
> 
> ========================
> Raw Event Data Dump
> ========================
> 
> Entry# (record in file)        1387.
> 
> Entry Body Size:          x00000240
> Entry body:
> 
>           15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order
>  0000:    3C7AB623  00060101  0007001E  013D0240   *_at_.=.........#.z<*
>  0010:    00000006  00000000  00003266  6C616C63   *hostna..........*
>  0020:    00000000  1A010064  00000000  00000001   *........d.......*
>  0030:    00000000  000002C0  00000000  00000000   *................*
>  0040:    00000000  0000020F  000001A0  00000118   *................*
>  0050:    00000000  00000000  00000000  00000000   *................*
>  0060:    00000000  00000000  00000000  00000000   *................*
>  0070:    00000000  00000000  00000000  00000000   *................*
>  0080:    00000000  00000000  00000000  00000000   *................*
>  0090:    00000000  00000000  00000000  F38A427F   *.B..............*
>  00A0:    00000000  00005200  FFFFFC00  004C8A50   *P.L......R......*
>  00B0:    00000000  00000000  00000000  00000257   *W...............*
>  00C0:    FFFFFC00  004C8310  00000001  00000016   *..........L.....*
>  00D0:    FFFFFC00  004C8790  1F1E1615  14020100   *..........L.....*
>  00E0:    FFFFFC00  004C8600  FFFFFC00  004CC1D0   *..L.......L.....*
>  00F0:    FFFFFFFF  FFF8C800  FFFFFC00  004C89C0   *..L.............*
>  0100:    00000000  00F0380C  00000000  00F00270   *p........8......*
>  0110:    00000000  00000000  0000020F  06600001   *..`.............*
>  0120:    FFFFFFFF  A3E6FA38  00000001  1FFFF090   *........8.......*
>  0130:    FFFFFC00  004C89F0  00000000  0B804000   *._at_........L.....*
>  0140:    00000000  0D53FA38  FFFFFC00  006A7570   *puj.....8.S.....*
>  0150:    00000000  00000000  FFFFFC00  004CC1D0   *..L.............*
>  0160:    00000000  00018000  00000000  00000000   *................*
>  0170:    00000041  62020000  00000000  80000000   *...........bA...*
>  0180:    00000000  00000000  00000000  00000000   *................*
>  0190:    00000000  000140D0  00000001  423E4C1C   *.L>B....._at_......*
>  01A0:    00000000  00000000  FFFFFF00  0001CD4F   *O...............*
>  01B0:    FFFFFFFF  F8F7FEFF  FFFFFFFF  F7FFEFFF   *................*
>  01C0:    FFFFFFF0  05FFFFFF  00000000  00009F9F   *................*
>  01D0:    00000000  00000000  FFFFFF00  1CAB795F   *_y..............*
>  01E0:    FFFFFFFF  80000080  00000000  00000000   *................*
>  01F0:    00000000  00000B93  00000000  00000010   *................*
>  0200:    00000000  0EE28FC0  00000000  0000F3F3   *................*
>  0210:    00000000  07060000  00000000  58000000   *...X............*
>  0220:    00000000  00000000  00000000  0000E002   *................*
>  0230:    003C7E25  00000000  FFFFFFFF  80140000   *............%~<^*
> 
> 
> 
> **** V3.3  ********************* ENTRY 1388 
> ********************************
> 
> 
> Logging OS                        2. Digital UNIX
> System Architecture               2. Alpha
> Event sequence number           318.
> Timestamp of occurrence              25-FEB-2002 17:09:39
> Host name                            clalf2
> 
> System type register      x0000001E  Systype 30. (Miata)
> Number of CPUs (mpnum)    x00000001
> CPU logging event (mperr) x00000000
> 
> Event validity                    1. O/S claims event is valid
> Event severity                    1. Severe Priority
> Entry type                      302. ASCII Panic Message Type
>                                  -1.    - (minor class)
> 
> SWI Minor class                   9. ASCII Message
> SWI Minor sub class               1. Panic
> 
> ASCII Message                        panic (cpu 0): System Uncorrectable
>                                      Machine Check
> 
> 
> 
> 
> Feb 25 17:22:05 hostna vmunix: Machine Check SYSTEM Fatal Abort
> Feb 25 17:22:05 hostna vmunix: Machine Check Code = 20f
> Feb 25 17:22:05 hostna vmunix: PCI master abort error
> Feb 25 17:22:05 hostna vmunix:     pal temp[0-1]        = 
> 00000000f38a427f 0000000000000000
> Feb 25 17:22:05 hostna vmunix:     pal temp[2-3]        = 
> fffffc00004c8a50 0000000000005200
> Feb 25 17:22:05 hostna vmunix:     pal temp[4-5]        = 
> 0000000000000257 0000000000000000
> Feb 25 17:22:06 hostna vmunix:     pal temp[6-7]        = 
> 0000000100000016 fffffc00004c8310
> Feb 25 17:22:06 hostna vmunix:     pal temp[8-9]        = 
> 1f1e161514020100 fffffc00004c8790
> Feb 25 17:22:06 hostna vmunix:     pal temp[10-11]        = 
> fffffc00004cc1d0 fffffc00004c8600
> Feb 25 17:22:06 hostna vmunix:     pal temp[12-13]        = 
> fffffc00004c89c0 fffffffffff8c800
> Feb 25 17:22:06 hostna vmunix:     pal temp[14-15]        = 
> 0000000000f00270 0000000000f0380c
> Feb 25 17:22:06 hostna vmunix:     pal temp[16-17]        = 
> 0000020f06600001 0000000000000000
> Feb 25 17:22:06 hostna vmunix:     pal temp[18-19]        = 
> 000000011ffff090 ffffffffa3e6fa38
> Feb 25 17:22:06 hostna vmunix:     pal temp[20-21]        = 
> 000000000b804000 fffffc00004c89f0
> Feb 25 17:22:06 hostna vmunix:     pal temp[22-23]        = 
> fffffc00006a7570 000000000d53fa38
> Feb 25 17:22:06 hostna vmunix:     shadow[0-1]        = 0000000000000000 
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix:     shadow[2-3]        = 0000000000000000 
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix:     shadow[4-5]        = 0000000000000000 
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix:     shadow[6-7]        = 0000000000000000 
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix:     Address of excepting instruction    = 
> fffffc00004cc1d0
> Feb 25 17:22:06 hostna vmunix:     Summary of arithmetic traps    = 
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix:     Exception mask            = 
> 0000000000000000
> Feb 25 17:22:06 hostna vmunix:     Base address for PALcode    = 
> 0000000000018000
> Feb 25 17:22:06 hostna vmunix:     Interrupt Status Reg        = 
> 0000000080000000
> Feb 25 17:22:07 hostna vmunix:     CURRENT SETUP OF EV5 IBOX    = 
> 0000004162020000
> Feb 25 17:22:07 hostna vmunix:     I-CACHE Reg Tag parity error    = 
> 0000000000000000
> Feb 25 17:22:07 hostna vmunix:     D-CACHE error Reg        = 
> 0000000000000000
> Feb 25 17:22:07 hostna vmunix:     Effective VA        = 00000001423e4c1c
> Feb 25 17:22:07 hostna vmunix:     reason for D-stream    = 
> 00000000000140d0
> Feb 25 17:22:07 hostna vmunix:     EV5 Secondary Cache address    = 
> ffffff000001cd4f
> Feb 25 17:22:07 hostna vmunix:     EV5 Secondary Cache TAG/Data 
> parity    = 0000000000000000
> Feb 25 17:22:07 hostna vmunix:     EV5 BC_TAG_ADDR        = 
> fffffffff7ffefff
> Feb 25 17:22:07 hostna vmunix:     EV5 EI_STAT_ADDR Phys addr of Xfer    
> = fffffffff8f7feff
> Feb 25 17:22:07 hostna vmunix:     Fill Syndrome        = 0000000000009f9f
> Feb 25 17:22:07 hostna vmunix:     EI_STAT reg        = fffffff005ffffff
> Feb 25 17:22:07 hostna vmunix:     LD_LOCK            = ffffff001cab795f
> Feb 25 17:22:07 hostna vmunix:     PYXIS_DMA_DATA        = 0000000000000000
> Feb 25 17:22:07 hostna vmunix:     CIA/PYXIS ERR            = 
> ffffffff80000080
> Feb 25 17:22:07 hostna vmunix:         PCI BUS Master state machine 
> generated Master Abort
> Feb 25 17:22:07 hostna vmunix:     CIA/PYXIS ERR STAT        = 
> 0000000000000010
> Feb 25 17:22:07 hostna vmunix:     CIA/PYXIS ERR MASK        = 
> 0000000000000b93
> Feb 25 17:22:08 hostna vmunix:     CIA/PYXIS ECC_SYN        = 
> 000000000000f3f3
> Feb 25 17:22:08 hostna vmunix:     CIA/PYXIS MEM ERR0        = 
> 000000000ee28fc0
> Feb 25 17:22:08 hostna vmunix:     CIA/PYXIS MEM ERR1        = 
> 0000000058000000
> Feb 25 17:22:08 hostna vmunix:     CIA/PYXIS PCI ERR0        = 
> 0000000007060000
> Feb 25 17:22:08 hostna vmunix:     CIA/PYXIS PCI ERR1        = 
> 000000000000e002
> Feb 25 17:22:08 hostna vmunix:     ISA bridge NMI status & control    = 
> 0000000000000000
> Feb 25 17:22:08 hostna vmunix:     CIA/PYXIS PCI ERR2        = 
> ffffffff80140000
> Feb 25 17:22:08 hostna vmunix: panic (cpu 0): System Uncorrectable 
> Machine Check
> 
Received on Thu Feb 28 2002 - 13:21:41 NZDT