HP OpenVMS Systems

ask the wizard
Content starts here

Fast Shadow Member Eviction? (SHADOW_SYS_DISK)

» close window

The Question is:

 
I have a client running OpenVMS 7.1 on a CI cluster with AXPs and VAXes and
HSJ40s and HSJ50s. They have implemented phase II volume shadowing.
 
We found this statement in: "DIGITAL StorageWorks,HSJ30/40 Array Controller
Operating Software, HSOF Version 3.2, Release Notes"
Fast Shadow Member Eviction
An MSCP flag is provided to enable rapid shadow member eviction when an
device error is
detected. OpenVMS can set this flag based on SYSGEN parameters in the field
SHADOW_SYS_DISK. The MSCP flag is called MD.SER. When set, and an I/O
encounters
a device error, the I/O is returned as failed without further error
recovery. OpenVMS can then
evict a shadowset member, as appropriate.
 
But when we look up the sysgen parameter SHADOW_SYS_DISK in the "Volume
Shadowing for OpenVMS" manaul, the only values described are 0 and 1. "All
values other than the value of 1 are reserved for use by Digital."
 
So what was the author of the HSJ30/40 manual describing? And is Fast Shadow
Member Eviction
available in OpenVMS? And if so what value should SHADOW_SYS_DISK be set to?
 


The Answer is :

 
  Certain customer applications have critical response time requirements.
  While performing READ I/O operations to a multiple member shadow set,
  these customer applications prefer minimal error recovery be done for
  "certain" I/O operations to certain volumes.
 
  The application assumes that if there is a multiple member shadow set
  with problems developed on one member, the the READ I/O operation can
  be satisfied from another "good" member.  Most volume shadowing customers
  rely on having just this behaviour.  The READ of one member comes back
  to the SHDRIVER with an error, and the driver then reads another member
  of the set successfully, and then writes that data to the "bad" member,
  making it good again.  (This is a very simple explanation of SHDRIVER
  error recovery processing.)
 
  To accommodate the customer requirement for critical response time, HSOF
  was modified to honor the use of the QIO function modifier, IO$M_INHRETRY.
  Customer application(s) would then have to modify all pertinent READ I/O
  operations to include that modifier.  To facilitate this for customer use
  (without requiring application modification), the SHDRIVER was modified
  to assume that bit was set to all READ I/Os that are receives if the
  SHADOW_SYS_DISK parameter bit 15 (ie: %X00008000 ) is set -- the retry
  is inhibited for every virtual unit, for every READ I/O operation.
 
  Consider taking this one step further, where the application does not want
  the SHDRIVER to do ANY error recovery.  Once the SHDRIVER receives the I/O
  back from the controller, the application wants that member to be expelled
  immediately and to have the READ I/O sent to another member.  That was
  accommodated also, with the use of another SHADOW_SYS_DISK flag, bit 13
  (ie: %X00002000).
 
  It should be noted that if a single member shadow set contains one "bad"
  LBN -- "bad" being here defined as an LBN that returns SS$_PARITY or
  SS$_FORCEDERROR for a READ I/O operation -- that once a new member is
  added to that set, that "bad" LBN(s) will be replicated to the new member.
  Any READ I/O operation to that "bad" LBN -- with SHADOW_SYS_DISK bit 13
  cleared -- will NOT expel members.  The SHDRIVER will read all of the
  members, find them all bad, and will return an error to the application
  as it cannot "repair" the shadowset.  (With bit 13 set, the repair and
  recovery logic is disabled.)
 
  Enabling this capability in the system parameters indicates that the
  application is willing to tolerate the occasional "dismemberment" of
  a shadowset, and that the application is also willing to tolerate
  any errors (eg: parity errors) that might be returned.  By enabling
  SHADOW_SYS_DISK bit 15, the controller will not perform any particular
  error recovery -- with the bit disabled, there might well be no error
  returned to the application.
 
  With a single-member shadowset -- either as originally configured, or
  as a result of "dismemberment" -- there is no error recovery, and all
  errors are returned to the application.
 
  The following kits (or later) are required:
    ALPSHAD04_071 or ALPSHAD07_062
    VAXSHAD04_071 or VAXSHAD07_062
 
  The following kits (or later) are recommended:
    ALPMOUN04_071 or ALPMOUN03_062
    VAXMOUN03_071 or VAXMOUN02_062
 

answer written or last revised on ( 14-JUL-1999 )

» close window