Hi, 
We got an Alphaserver 2100 4/275 with KZPSC RAID controller. DU version
is 3.2F (Rev 69.73)
A 5-RZ28-disks RAID 5 level is accesed via the KZPSC controller
externally. This array contains 2 AdvFS filesets and the database
(Oracle) accesed through a raw device.
This installation has been working properly with OSF V.3.0B by around 2
years. Four or five months ago, it began crashing randomly (crashes
related to AdvFS). The first decision was to migrate to DU 3.2F.
Two weeks ago, 3.2F crashed with problems related also to AdvFS. A
segment of "messages" is below:
Oct 12 00:10:26 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000824.800eu  page 3104
Oct 12 00:10:27 alpha21 vmunix:         vd 1  blk 745840  blkCnt 128
Oct 12 00:10:27 alpha21 vmunix:         write error = 5
Oct 12 00:10:28 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000824.800eu  page 3552
Oct 12 00:10:28 alpha21 vmunix:         vd 1  blk 1131616  blkCnt 128
Oct 12 00:10:28 alpha21 vmunix:         read error = 5
Oct 12 00:10:28 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001  tag 0x00003dd5.8001u  page 2967
Oct 12 00:10:29 alpha21 vmunix:         vd 1  blk 2536128  blkCnt 128
Oct 12 00:10:29 alpha21 vmunix:         write error = 5
Oct 12 00:10:29 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.fffffffe.0000  tag 0xfffffff7.0000u  page 337
Oct 12 00:10:29 alpha21 vmunix:         vd 1  blk 6496  blkCnt 80
Oct 12 00:10:29 alpha21 vmunix:         write error = 5
Oct 12 00:10:29 alpha21 vmunix: 
Oct 12 00:10:29 alpha21 vmunix: bs_osf_complete: metadata write failed
Oct 12 00:10:30 alpha21 vmunix: AdvFS Domain Panic; Domain local_domain
Id 0x2df0fa52.000d0bc0
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001  tag 0x00003dd5.8001u  page 2983
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 2536384  blkCnt 128
Oct 12 00:10:30 alpha21 vmunix:         write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001  tag 0x00003dd5.8001u  page 2975
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 2536256  blkCnt 128
Oct 12 00:10:30 alpha21 vmunix:         write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001  tag 0x00003dd5.8001u  page 2959
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 2536000  blkCnt 128
Oct 12 00:10:30 alpha21 vmunix:         write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000824.800eu  page 3112
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 745968  blkCnt 128
Oct 12 00:10:30 alpha21 vmunix:         write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000824.800eu  page 3544
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 1131488  blkCnt 128
Oct 12 00:10:30 alpha21 vmunix:         read error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa52.000d0bc0.1.8001  tag 0x00003dd5.8001u  page 2992
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 681424  blkCnt 128
Oct 12 00:10:30 alpha21 vmunix:         read error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000824.800eu  page 3120
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 746096  blkCnt 128
Oct 12 00:10:30 alpha21 vmunix:         write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000824.800eu  page 3128
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 746224  blkCnt 128
Oct 12 00:10:30 alpha21 vmunix:         write error = 5
Oct 12 00:10:30 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000824.800eu  page 3136
Oct 12 00:10:30 alpha21 vmunix:         vd 1  blk 746352  blkCnt 128
Oct 12 00:10:31 alpha21 vmunix:         write error = 5
Oct 12 00:10:31 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.fffffffe.0000  tag 0xfffffff7.0000u  page 129
Oct 12 00:10:31 alpha21 vmunix:         vd 1  blk 2432  blkCnt 16
Oct 12 00:10:31 alpha21 vmunix:         write error = 5
Oct 12 00:10:31 alpha21 vmunix: 
Oct 12 00:10:31 alpha21 vmunix: bs_osf_complete: metadata write failed
Oct 12 00:10:31 alpha21 vmunix: AdvFS Domain Panic; Domain home_domain
Id 0x2df0fa3f.0003d000
Oct 12 00:13:07 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000001.8001u  page 426
Oct 12 00:13:07 alpha21 vmunix:         vd 1  blk 1595936  blkCnt 96
Oct 12 00:13:07 alpha21 vmunix:         read error = 5
Oct 12 00:13:07 alpha21 vmunix: advfs I/O error: setId
0x2df0fa3f.0003d000.1.8001  tag 0x00000006.8001u  page 0
Oct 12 00:13:08 alpha21 vmunix:         vd 1  blk 8720  blkCnt 16
Oct 12 00:13:08 alpha21 vmunix:         read error = 5
Additionally, the "uerf" reports the following problems at the time of
the crash:
                                                  uerf version 4.2-011
(122)
********************************* ENTRY     1.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS                             OPERATIONAL EVENT 
OS EVENT TYPE                  300.     SYSTEM STARTUP 
SEQUENCE NUMBER                  0.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Wed Oct 23 14:37:07 1996
OCCURRED ON SYSTEM                      alpha21 
SYSTEM ID                 x00060009     CPU TYPE:  DEC 2100 
SYSTYPE                   x00000000
MESSAGE                                 PCXAL keyboard, language English
                                         _(American) 
                                         
                                        Alpha boot: available memory
from 
                                         _0x11dc000 to 0x1fffe000 
                                        Digital UNIX V3.2F (Rev. 69.73);
Thu 
                                         _Oct 10 19:18:19 GMT-0500 1996
                                        physical memory = 512.00
megabytes. 
                                        available memory = 494.23
megabytes. 
                                        using 1958 buffers containing
15.29 
                                         _megabytes of memory 
                                        Master cpu at slot 0. 
                                        Firmware revision: 4.5 
                                        PALcode: OSF version 1.45 
                                        ibus0 at nexus 
                                        AlphaServer 2100 4/275 
                                        cpu 0 EV-45 4mb b-cache 
                                        cpu 1 EV-45 4mb b-cache 
                                        gpc0 at ibus0 
                                        pci0 at ibus0 slot 0 
                                        tu0: DECchip 21040-AA: Revision:
2.3 
                                        tu0 at pci0 slot 0 
                                        tu0: DEC TULIP Ethernet
Interface, 
                                         _hardware address:
08-00-2B-E2-6A-42 
                                        tu0: console mode: selecting UTP
                                         _(10BaseT) port: no link 
                                        psiop0 at pci0 slot 1 
                                        Loading SIOP: script 1001f00,
reg 
                                         _81222000, data 100de20 
                                        scsi0 at psiop0 slot 0 
                                        rz0 at scsi0 bus 0 target 0 lun
0 (DEC 
                                         _    RZ28     (C) DEC 442D) 
                                        rz3 at scsi0 bus 0 target 3 lun
0 (DEC 
                                         _    RZ28     (C) DEC D41C) 
                                        rz6 at scsi0 bus 0 target 6 lun
0 (DEC 
                                         _    RRD43   (C) DEC  1084) 
                                        tz5 at scsi0 bus 0 target 5 lun
0 (DEC 
                                         _    TLZ6      (C)DEC 0491) 
                                        eisa0 at pci0 
                                        ace0 at eisa0 
                                        ace1 at eisa0 
                                        lp0 at eisa0 
                                        fdi0 at eisa0 
                                        fd0 at fdi0 unit 0 
                                        dns0 at eisa0 
                                        dns0: Digital WAN Device Driver 
                                         _Interface 
                                        dns1: Digital WAN Device Driver 
                                         _Interface 
                                        dns1 at eisa0 
                                        dns2: Digital WAN Device Driver 
                                         _Interface 
                                        dns3: Digital WAN Device Driver 
                                         _Interface 
                                        vga0 at eisa0 
                                         1024x768 (QVision ) 
                                        fta0 DEC CRE DEFEA FDDI Module, 
                                         _Hardware Revision 2 
                                        fta0 at eisa0 
                                        fta0: DMA Available. 
                                        fta0: DEC CRE DEFEA (PDQ) FDDI 
                                         _Interface, Hardware address: 
                                         _08-00-2B-B7-27-FE 
                                        fta0: Firmware rev: 2.46 
                                        Initializing xcr0.  Please wait.
                                        Initializing xcr0.  Please wait.
                                        Initializing xcr0.  Please wait.
                                        Initializing xcr0.  Please wait.
                                        Initializing xcr0.  Please wait.
                                        xcr0 at eisa0 
                                        re0 at xcr0 unit 0 (unit status
= 
                                         _ONLINE, raid level = 5) 
                                        pza0 at pci0 slot 7 
                                        pza0 firmware version: DEC  P01
A10   
                                         _ 
                                        scsi1 at pza0 slot 0 
                                        pza1 at pci0 slot 8 
                                        pza1 firmware version: DEC  P01
A10   
                                         _ 
                                        scsi2 at pza1 slot 0 
                                        lvm0: configured. 
                                        lvm1: configured. 
                                        dli: configured 
                                        SuperLAT. Copyright 1993
Meridian 
                                         _Technology Corp. All rights 
                                         _reserved. 
                                        x25_access: configured 
                                        wandd_base: configured 
                                        wandd_lapb: configured 
                                        wan_utilities: configured 
                                        ctf_base: configured 
                                        Node ID is 08-00-2b-b7-27-fe
(from 
                                         _device fta0) 
                                        dna_netman: configured 
                                        dna_dli: configured 
********************************* ENTRY     2.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  198.     ASTRO CONTROLLER 
SEQUENCE NUMBER                  3.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Wed Oct 23 14:25:53 1996
OCCURRED ON SYSTEM                      alpha21 
SYSTEM ID                 x00060009     CPU TYPE:  DEC 2100 
SYSTYPE                   x00000000
PROCESSOR COUNT                  2.
PROCESSOR WHO LOGGED      x00000000
----- UNIT INFORMATION -----
CLASS                         x0000     DISK 
SUBSYSTEM                     x0000     DISK 
BUS #                         x0000
----- CAM STRING -----
ROUTINE NAME                            xcr_e_restart 
----- CAM STRING -----
                                        Can't restart Controller 
----- CAM STRING -----
ERROR TYPE                              Hard Error Detected 
********************************* ENTRY     3.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  198.     ASTRO CONTROLLER 
SEQUENCE NUMBER                  2.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Wed Oct 23 14:25:47 1996
OCCURRED ON SYSTEM                      alpha21 
SYSTEM ID                 x00060009     CPU TYPE:  DEC 2100 
SYSTYPE                   x00000000
PROCESSOR COUNT                  2.
PROCESSOR WHO LOGGED      x00000000
----- UNIT INFORMATION -----
CLASS                         x0000     DISK 
SUBSYSTEM                     x0000     DISK 
BUS #                         x0000
----- CAM STRING -----
ROUTINE NAME                            xcr_cmd_timeout 
----- CAM STRING -----
                                        Controller has stopped
responding 
----- CAM STRING -----
ERROR TYPE                              Hard Error Detected 
----- CAM STRING -----
                                        Controller Softc at time of
error 
----- ENT_XCR_SOFTC -----
*SC_BUS_NAME          xFFFFFC00006A20E0
SC_CNTRL_NUM          x0000000000000000
SC_CNTRL_TYPE         x006A2AC000000000
*SC_CTRL              xFFFFFC00006A2AC0
SC_IOHANDLE           x000003A000008000
SC_FLAGS                  x00000002
SC_REG_OFF                x00000C90
SC_MAX_ACT                x0000003C
SC_SPEC_ACT               x00000004
SC_CMDS_ACT               x00000003
*SC_ACT_FLINK         xFFFFFC001FE556B8
*SC_ACT_BLINK         xFFFFFC001FE55A50
SC_CMDS_PENDING           x00000000
*SC_PEND_FLINK        xFFFFFC001FE55050
*SC_PEND_BLINK        xFFFFFC001FE55050
*SC_FREE_FLINK        xFFFFFC001FE559B0
*SC_FREE_BLINK        xFFFFFC001FE55848
SC_FREE_CMD_SLOTS         x0000003D
********************************* ENTRY     4.
*********************************
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  198.     ASTRO CONTROLLER 
SEQUENCE NUMBER                  1.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Wed Oct 23 14:25:26 1996
OCCURRED ON SYSTEM                      alpha21 
SYSTEM ID                 x00060009     CPU TYPE:  DEC 2100 
SYSTYPE                   x00000000
PROCESSOR COUNT                  2.
PROCESSOR WHO LOGGED      x00000000
----- UNIT INFORMATION -----
CLASS                         x0000     DISK 
SUBSYSTEM                     x0000     DISK 
BUS #                         x0000
----- CAM STRING -----
ROUTINE NAME                            xcrintr 
----- CAM STRING -----
                                        No interrupt bit set 
----- CAM STRING -----
ERROR TYPE                              Hard Error Detected 
----- CAM STRING -----
                                        Controller Softc at time of
error 
----- ENT_XCR_SOFTC -----
*SC_BUS_NAME          xFFFFFC00006A20E0
SC_CNTRL_NUM          x0000000000000000
SC_CNTRL_TYPE         x006A2AC000000000
*SC_CTRL              xFFFFFC00006A2AC0
SC_IOHANDLE           x000003A000008000
SC_FLAGS                  x00000000
SC_REG_OFF                x00000C90
SC_MAX_ACT                x0000003C
SC_SPEC_ACT               x00000004
SC_CMDS_ACT               x00000001
*SC_ACT_FLINK         xFFFFFC001FE556B8
*SC_ACT_BLINK         xFFFFFC001FE556B8
SC_CMDS_PENDING           x00000000
*SC_PEND_FLINK        xFFFFFC001FE55050
*SC_PEND_BLINK        xFFFFFC001FE55050
*SC_FREE_FLINK        xFFFFFC001FE55938
*SC_FREE_BLINK        xFFFFFC001FE55848
SC_FREE_CMD_SLOTS         x0000003F
I know, it looks like an obvious hardware problem, but all have been
changed, controller, cables, connectors, etc, etc.
If someone knows about a simmilar problem and how to solve it, please
let me know. Of course, i'll summarize.
Regards
JAN
Received on Wed Oct 23 1996 - 21:43:13 NZDT