Hi folks,
I administrate a Compaq Alpha Server ES40 with four EV67 CPUs 666Mhz
with
8 GByte RAM with the following memory organisation:
    0       2048Mb     0000000000000000    4-Way
    1       2048Mb     0000000080000000    4-Way
    2       2048Mb     0000000100000000    4-Way
    3       2048Mb     0000000180000000    4-Way
After one year of running a few weeks ago the following messages appear
in the syslog:
Jul 22 16:12:15 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 2. Reporting suspended.
Jul 22 16:13:27 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 1. Reporting suspended.
Jul 22 16:14:18 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 3. Reporting suspended.
and a few hours later the machine goes to the console prompt.
During the memory test the following messages appear:
EV6 Correctable Memory Fill ECC Error on CPU 0
C_ADDR:         00000000A8FC5BC0
C_SYNDROME_1:   0000000000000057
C_SYNDROME_0:   0000000000000000
EV6 Correctable Dcache ECC Error on CPU 0
EV6 Correctable Memory Fill ECC Error on CPU 0
C_ADDR:         00000000A8FD2BC0
C_SYNDROME_1:   0000000000000057
C_SYNDROME_0:   0000000000000000
First, I thought, it's an defect DRAM module, located in bank 1 because
of the
C_ADDR information. But after removing bank the error still occurs. 
So, my question, it is a memory or CPU problem, and, if it's a memoery
problem,
how can I determine the defect DRAM Chip? I haven't found any suitable
documentation.
Many thanks & Bye,
Christian
-- 
         v          
      ..d8b..       Dipl.inform. Christian Becker 
  ..:::d888b:::..
 :::::d88888b:::::  Institut fuer Angewandte Mathematik & Numerik, LS3
:::::d8888888b::::: Universitaet Dortmund 
::::d888888888b:::: Vogelpothsweg 87, 44227 Dortmund, Germany 
 ::{8888P"::"V8,::  Voicemail: +49 231 755 5934 FAX: +49 231 755 5933 
  :D8P":::::::VD:   mailto:Christian.Becker_at_mathematik.uni-dortmund.de  
  dP  ```````   Y
Received on Mon Aug 05 2002 - 14:39:23 NZST