One of our Alpha 1000's (DU4.0d, patchkit 3) crashed yesterday morning 
with no trace of an error, nothing in any logs, no crash data, etc.  
Using uerf, we did see a controller error from last week.  We're not 
certain if it is related or even exactly what it means.  Looking back 
through the dia output, we've found a few instances of these errors 
since July of last year.
Compaq support had us install DECevent to get more detailed information;
the output from it for the latest error is included below.  Would someone
please take pity on me and explain what it means?  It looks (to the
uninformed, that being me :-) like an error with the controller itself, 
since it doesn't appear to mention a drive on the controller. 
Compaq thinks that the controller is going bad, although they're now
reviewing the dia output to be sure.  If this is enough information to
tell, does that appear to be the case?  Also, to what does the "Needs to
be Restarted" flag refer?
Thank you for your time and assistance.
Dawn Lovell
dawn.lovell_at_centurytel.com
--- dia output ---
Logging OS                        2. Digital UNIX
System Architecture               2. Alpha
Event sequence number            12.
Timestamp of occurrence              11-MAY-1999 01:46:50
Host name                            vs2
System type register      x00000011  AlphaServer 1000
Number of CPUs (mpnum)    x00000001
CPU logging event (mperr) x00000000
Event validity                    1. O/S claims event is valid
Event severity                    3. High Priority
Entry type                      198. SWXCR RAID Controller Event
------ Device Data ------
Class                           x00  RAID Disk
Subsystem                       x20  SWXCR Mport/RAID Controller
Number of Packets                 5.
------ Packet Type ------       258. Module Name String
Routine Name                         xcr_cmd_timeout
------ Packet Type ------       256. Generic String
                                     Controller has stopped responding
------ Packet Type ------       260. Hardware Error String
Error Type                           Hard Error Detected
------ Packet Type ------       256. Generic String
                                     Controller Softc at time of error
------ Packet Type ------       512. SWXCR Softc(XCR_SOFTC)
   Packet Revision                2.
Controller Number         x00000000
Controller Version        x00000000
Flags                     x00000002  Needs to be Restarted.
Normal Commands Active           60.
Special Commands Active           4.
Command Slots Active              0.
Commands on Pending List          0.
Command Slots Available          61.
2560. Bytes Cmd Que Data             ** Not Printed **
--- End of dia output---
Received on Wed May 19 1999 - 15:46:13 NZST