HP OpenVMS Systemsask the wizard |
The Question is: I have a client running OpenVMS 7.1 on a CI cluster with AXPs and VAXes and HSJ40s and HSJ50s. They have implemented phase II volume shadowing. We found this statement in: "DIGITAL StorageWorks,HSJ30/40 Array Controller Operating Software, HSOF Version 3.2, Release Notes" Fast Shadow Member Eviction An MSCP flag is provided to enable rapid shadow member eviction when an device error is detected. OpenVMS can set this flag based on SYSGEN parameters in the field SHADOW_SYS_DISK. The MSCP flag is called MD.SER. When set, and an I/O encounters a device error, the I/O is returned as failed without further error recovery. OpenVMS can then evict a shadowset member, as appropriate. But when we look up the sysgen parameter SHADOW_SYS_DISK in the "Volume Shadowing for OpenVMS" manaul, the only values described are 0 and 1. "All values other than the value of 1 are reserved for use by Digital." So what was the author of the HSJ30/40 manual describing? And is Fast Shadow Member Eviction available in OpenVMS? And if so what value should SHADOW_SYS_DISK be set to? The Answer is :
Certain customer applications have critical response time requirements.
While performing READ I/O operations to a multiple member shadow set,
these customer applications prefer minimal error recovery be done for
"certain" I/O operations to certain volumes.
The application assumes that if there is a multiple member shadow set
with problems developed on one member, the the READ I/O operation can
be satisfied from another "good" member. Most volume shadowing customers
rely on having just this behaviour. The READ of one member comes back
to the SHDRIVER with an error, and the driver then reads another member
of the set successfully, and then writes that data to the "bad" member,
making it good again. (This is a very simple explanation of SHDRIVER
error recovery processing.)
To accommodate the customer requirement for critical response time, HSOF
was modified to honor the use of the QIO function modifier, IO$M_INHRETRY.
Customer application(s) would then have to modify all pertinent READ I/O
operations to include that modifier. To facilitate this for customer use
(without requiring application modification), the SHDRIVER was modified
to assume that bit was set to all READ I/Os that are receives if the
SHADOW_SYS_DISK parameter bit 15 (ie: %X00008000 ) is set -- the retry
is inhibited for every virtual unit, for every READ I/O operation.
Consider taking this one step further, where the application does not want
the SHDRIVER to do ANY error recovery. Once the SHDRIVER receives the I/O
back from the controller, the application wants that member to be expelled
immediately and to have the READ I/O sent to another member. That was
accommodated also, with the use of another SHADOW_SYS_DISK flag, bit 13
(ie: %X00002000).
It should be noted that if a single member shadow set contains one "bad"
LBN -- "bad" being here defined as an LBN that returns SS$_PARITY or
SS$_FORCEDERROR for a READ I/O operation -- that once a new member is
added to that set, that "bad" LBN(s) will be replicated to the new member.
Any READ I/O operation to that "bad" LBN -- with SHADOW_SYS_DISK bit 13
cleared -- will NOT expel members. The SHDRIVER will read all of the
members, find them all bad, and will return an error to the application
as it cannot "repair" the shadowset. (With bit 13 set, the repair and
recovery logic is disabled.)
Enabling this capability in the system parameters indicates that the
application is willing to tolerate the occasional "dismemberment" of
a shadowset, and that the application is also willing to tolerate
any errors (eg: parity errors) that might be returned. By enabling
SHADOW_SYS_DISK bit 15, the controller will not perform any particular
error recovery -- with the bit disabled, there might well be no error
returned to the application.
With a single-member shadowset -- either as originally configured, or
as a result of "dismemberment" -- there is no error recovery, and all
errors are returned to the application.
The following kits (or later) are required:
ALPSHAD04_071 or ALPSHAD07_062
VAXSHAD04_071 or VAXSHAD07_062
The following kits (or later) are recommended:
ALPMOUN04_071 or ALPMOUN03_062
VAXMOUN03_071 or VAXMOUN02_062
|