One of my machines here is an Alphastation 500/333, which is giving some
unusual problems with a Micropolis fast/wide SCSI disk. I wonder if
anyone here might have any ideas as to a solution (other than "buy
approved Digital disks"!).
The scsi bus has these devices attached (taken from uerf for
convenience):
rz0 at scsi0 target 0 lun 0 (LID=0) 
_(DEC     RZ28D    (C) DEC 0008) (Wide16) 
rz2 at scsi0 target 2 lun 0 (LID=1) 
_(MICROP  3391WS           x43h) (Wide16) 
tz3 at scsi0 target 3 lun 0 (LID=2) 
_(ARCHIVE Python 28849-XXX 4.CM) 
changer at scsi0 target 3 lun 1 
_(LID=3) (ARCHIVE Python 28849-XXX 4.CM) 
rz4 at scsi0 target 4 lun 0 (LID=4) 
_(DEC     RRD45   (C) DEC  1645) 
And I am running Digital UNIX V4.0B  (Rev. 564), firmware version 6.4-3.
Most relevant-seeming patches from the duv40bas00005-19970926 kit are
installed.
The tape device is external and terminated, but same behaviour seen with
it absent. The RZ28 is the system disk.
What happens is, the machine will run normally for some amount of time
(3 hours... 5 days...) then I get a kernal panic and crash. From the
error log, there are many of these events:
EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  199.     CAM SCSI 
SEQUENCE NUMBER                 34.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Thu Nov 20 17:09:51 1997
OCCURRED ON SYSTEM                      mnhepw 
SYSTEM ID                 x0005000F
SYSTYPE                   x00000000
----- UNIT INFORMATION -----
CLASS                         x0000     DISK 
SUBSYSTEM                     x0000     DISK 
BUS #                         x0000
                              x0010     LUN x0
                                        TARGET x2
And the crash dump says (hope I'm picking out the important part here):
Hard Error Detected
MICROP  3391WS          ^X3391WS
Active CCB at time of error
Command timed out
cam_logger: CAM_ERROR packet
cam_logger: bus 0 target 2 lun 0 
cdisk_complete
Retries Exhausted
Hard Error Detected
MICROP  3391WS          ^X3391WS
Active CCB at time of error
Command timed out
AdvFS I/O error:
    Volume: /dev/rz2g
    Tag: 0xfffffff7.0000
    Page: 450
    Block: 7614528
    Block count: 32
    Type of operation: Write
    Error: 5
OK, this clearly points at the Micropolis disk as the culprit, but I
don't suspect a hardware fault in the disk, as the same behaviour was
also seen with a different Micropolis disk (3243WS; a 4G disk instead of
9G). Also similar behaviour: after the crash, "show devices" at the
console doesn't list the Micropolis disk, which doesn't return until a
power-cycle.
Dropping in a (non-wide) Seagate disk instead of a Micropolis,
everything works ok.
I think that probably tells the whole story - but is there likely to be
any solution, or is this just a bad hardware mismatch?
Thanks very much for any insight/hints/answers,
Graham Allan
University of Minnesota
Received on Fri Nov 21 1997 - 01:41:52 NZDT