DU Managers,
     
     I got atleast 17 messages asking to to share the wealth. Since the 
     messages are keep comming, I decided to ignore the bandwith and send 
     this long summary from individuals that provided useful help and 
     scripts to monitor hsz40
     
     Thanks and again sorry for the bandwidth...
     
     Ronny
     The Walt Disney Company
     Disney Studios, Burbank California.
     --
     
     Tell the boss what you really think of him...and the truth shall set you 
     free. --Railway Clerk
     ------------------------------------------------------------------------
     
     ***** Bob.Capps_at_pscmail.ps.net wrote ******
     
     Ronny,
     
     A simple crontab with captured output is really all you need.
     
     00,30 * * * * /usr/bin/hszterm -f /dev/rrza24a "show this_controller 
     full" >/tmp/hszterm.out
     
     According to the manpage, if you pass a command string to hszterm, 
     that is all that it executes.  The above crontab entry gave me the 
     following:
     
     # cat /tmp/hszterm.out
     
     
     Copyright Digital Equipment Corporation 1993, 1995. All rights 
     reserved. HSZ40 Firmware version V25Z-1, Hardware version  B02
     
     Last fail code: 018800A0
     
     Press " ?" at any time for help.
     
     
     HSZ>
     Controller:
     HSZ40 ZG54002333 Firmware V25Z-1, Hardware  B02 Configured for 
     dual-redundancy with ZG54302696
     In dual-redundant configuration
     SCSI address 7
     Time: NOT SET
     Host port:
     SCSI target(s) (0, 2, 4, 5), Preferred target(s) (0, 2, 4, 5) 
     Cache:
     32 megabyte write cache, version 2
     Cache is GOOD
     Battery is GOOD
     No unflushed data in cache
     CACHE_FLUSH_TIMER = 65535 (seconds)
     CACHE_POLICY = A
     Licensing information:
     RAID (RAID Option) is ENABLED, license key is VALID
     WBCA (Writeback Cache Option) is ENABLED, license key is VALID MIRR 
     (Disk Mirroring Option) is ENABLED, license key is VALID
     Extended information:
     Terminal speed 9600 baud, eight bit, no parity, 1 stop bit Operation 
     control: 00000004  Security state code: 6566
     HSZ>
     
     
     #
     
     Just stick this in some form of notification script with filters to 
     get the
     
     info you want:
     
     RESCD=`grep 'Cache|Battery' /tmp/hszterm.out | grep -v 'GOOD' | wc -l 
     | 
     sed 's/ //g`
     if [ "$RESCD" != "0" ]; then
     # Notify sysadmin
     # ...
     fi
     
     Bob
     
     Perot Systems
     bob.capps_at_ps.net
     
     p.s.  As you can see, my policy is set to 'A' but after reading your 
     notice
     
     at the bottom of your message, I think that I want it set to 'B'.  
     Thanks for the tip!
     
     ********************************************************************** 
     Date: Monday, 24 March 1997 7:25am ET To: Sendout
     From: Stephen.Strobel_at_STC001
     Subject: hsz40 question!
     In-Reply-To: The letter of Friday, 21 March 1997 6:54pm ET
     
     
     Ronny,
     
     Most of the battery problems are "not neccisarily" battery problems.  
     Though they might be.  HSOF versions prior to 2.7-2 caused batteries 
     to be reported bad when in reality they were OK.  2.7-2 (extra 
     patches) or 3.0-3 fixes this problem.
     
     I have another question for you.  I'm pushing DEC very very hard to 
     get them to move the "Dual Pathing" issue up on the develoment plans.  
     Assuming
     that becuase this is a business critical system, I would assume that 
     you have a dual redundent controllers.  Dual Pathing would provide two 
     SCSI busses
     to each controller pair.  If a controller, cable, KZPSA, DWLPA or hose 
     went bad the the devices on the controller would fail over to the 
     other
     controller and thedevices at the OS level would also fail over.  I 
     view this as a must for business critical systems.  If you feel the 
     same way, I encourage you to contact your DEC sales rep and let him 
     know.  I'm pushing this all the way to Palmer.
     
     Now, to answer your questions, yes you can script.  Here is one I am 
     using:
     
     #]/bin/ksh
     #
     sysname=$(hostname)
     print "host is" $sysname > /usr/users/root/hsz40/hsz40_$(date 
     +"%h-%d-%y").txt #
     function hsz40_check
     {
     while read -r HSZ
     do
     hszterm -f $HSZ "show failedset"
     done < /usr/users/root/hsz40/hsz40_list 
     }
     hsz40_check >> /usr/users/root/hsz40/hsz40_$(date +"%h-%d-%y").txt 
     #while read -r line
     #do case  $(date +"%a") in
     #      Mon!Tue!Wed!Thu!Fri) cat /usr/users/root/hsz40/hsz40_$(date \ 
     #+"%h-%d-%y").txt !  mailx -s "$(hostname)_$(date +"%h-%d-%y")_hsz40" 
     \ #$line%stc001_at_nodea.steel
     
     I'm currently not using the mail feature.  I produce a summery report 
     where I grep for errors and such.
     
     Hope this helps.  Call if you wish.
     
     Steve Strobel
     616 248-7497
     **********************************************************************
     
     Jeff.Beck_at_orcas.iasl.ca.boeing.com wrote **************
     
     >     Q: Is there a way to run hszterm (non-interactive/via crontab 
     entry) >     and dump the output of "SHOW THIS_CONTROLLER FULL".
     >     It would be interesting to hear how other managers resolve this 
     >     without using PolyCenter Console Manager type of software. 
     
     Ronny, here's the cron script I use which is along the lines of what 
     you
     want to do, except I get paged via Console Manager.   Jeff
     
     
     #!/bin/ksh 
     #################################################### 
     #                                                  # 
     #         Boeing ASL NFS File Server               # 
     #                                                  # 
     #  Name: raid_check                                # 
     #                                                  # 
     #  This script poles the HSZ40 Failedsets and      # 
     #  alarms to the syslog if a disk fails.           # 
     #  The message is picked up by the Console         # 
     #  Manager and a Sys-Administrator is notified.    # 
     #                                                  # 
     #  Created: 23-Apr-1996  Ben Johnson               # 
     #                                                  #
     
     LUMP=`hszterm -b2 -t5 -l0 "show failed" | grep DISK | cut -c 44-54` 
     FDRV=`expr substr "$LUMP" 3 7`
     # echo "$FDRV"
     if [ `expr "$FDRV" : "DISK"` != 0 ] ; then
     logger -p 2 "RAID_check: HSZ #1 SCSI #2  ${FDRV%' '} has FAILED"
     # echo "`hostname -s`: RAID_check: HSZ #1 SCSI #3  ${FDRV%' '} has 
     FAILED" fi
     
     LUMP=`hszterm -b3 -t5 -l0 "show failed" | grep DISK | cut -c 44-54` 
     FDRV=`expr substr "$LUMP" 3 7`
     # echo "$FDRV"
     if [ `expr "$FDRV" : "DISK"` != 0 ] ; then
     logger -p 2 "RAID_check: HSZ #1 SCSI #3  ${FDRV%' '} has FAILED"
     # echo "`hostname -s`: RAID_check: HSZ #1 SCSI #3  ${FDRV%' '} has 
     FAILED" fi
     
     **********************************************************************
     
          Q: Is there a way to run hszterm (non-interactive/via crontab 
     entry) >     and dump the output of "SHOW THIS_CONTROLLER FULL".
     
     Yes, hszterm, you'll need to install:
     SWACLI11A       installed       HSZ40 Array Controller Utility (Alpha)
     
     >     It would be interesting to hear how other managers resolve this 
     >     without using PolyCenter Console Manager type of software.
     
     We use polycenter console manager to retain console logs of our 7 
     hsz's and our three primary systems... it's been invaluable for 
     troubleshooting.
     
     Besides that, we have a nightly script poll the hsz's for changes and 
     problems.  I'll attach the script.  It and some other tools can
     be obtained via anonymous ftp 
     raven.alaska.edu:/pub/sois/UA_DUtools.tar.Z (the script may invoke an 
     ua* program for massaging data).
     
     Battery problems are effectively resolved (allegedly), by using the 
     newer ones... I have more information buried someplace if you need 
     it... off the top of my head it's use the EDI ones only (scrap the 
     Hyundai). Also if you run dual-redundant (v3.0 only, v2.7 doesn't cut 
     it) you
     are protected... it will (allegedly) failover on low battery in v3.0. 
     kurt
     
     #!/bin/ksh
     #Copyright (c)  1996-1997  by   University of Alaska Computer Network 
     #
     #950120         hszterm.ksh     gather hsz configuration, report 
     changes #
     #970119 sxkac   change alert address to sdsys (alias to systems folks) 
     #960922 sxkac   poll consoles separately; show raidset full
     #960730 sxkac   1r on spike and 3n on nugget
     #960511 sxkac   modified reporting for hsz v2.7; deleted older history 
     ###################################################################### 
     ######### #       ALERT="sxkac "
     ALERT="sdsys "                  # mail addresses for reporting 
     sanity="java"                   # sanity node for configuration copies 
     if (test -z "$UA_Profile") then         # has our profile executed?
     . ./.profile                    # nope, do it now (must be an rsh) 
     fi
     cd      $HOME/config                    # stick it in our config 
     directory
     hostname=$(uname -n)
     hostname=${hostname%%.*}
     mv      $hostname/hsz_*.*       old
     ER_LOG="$hostname/hsz_term.errors"
     ###################################################################### 
     ######### function check                                 
     # check if sts ok
     {
     echo "
     Check:  $1"
     eval    $1                                      
     # execute command sts=$?                                          
     # capture status
     
     if ((0 == $sts))        
     then    
     return; 
     fi      
     # command ok? return...
     
     echo    "Error($sts): $1"
     echo    "Error($sts): $1"                       >> $ER_LOG return 
     }
     #--------------------------------------------------------------------- 
     --------- function err_chk                                
     # check if sts ok
     {
     echo    "Error($sts): $1"
     echo    "Error($sts): $1"                       >> $ER_LOG return 
     }
     #--------------------------------------------------------------------- 
     --------- function get_hsz                                
     # get hsz information
     {
     sudo    hszterm -f /dev/${1} \
     "show devices full"             >  $hostname/hsz_${2}.devi sts=$? 
     if ((0 != $sts)) then   err_chk "hsz(${2}) show devices";       fi
     
     sudo    hszterm -f /dev/${1} \
     "show units   full"             >  $hostname/hsz_${2}.unit sts=$? 
     if ((0 != $sts)) then   err_chk "hsz(${2}) show units  ";       fi
     
     sudo    hszterm -f /dev/${1} \
     "show raid   full"              >  $hostname/hsz_${2}.raid sts=$? 
     if ((0 != $sts)) then   err_chk "hsz(${2}) show raid ";         fi
     
     sudo    hszterm -f /dev/${1} \
     "show mirror full"              >> $hostname/hsz_${2}.raid sts=$? 
     if ((0 != $sts)) then   err_chk "hsz(${2}) show mirror";        fi
     
     sudo    hszterm -f /dev/${1} \
     "show stripe full"              >> $hostname/hsz_${2}.raid sts=$? 
     if ((0 != $sts)) then   err_chk "hsz(${2}) show stripe";        fi
     
     sudo    hszterm -f /dev/${1} \
     "show this   full"              >  $hostname/hsz_${2}.this sts=$? 
     if ((0 != $sts)) then   err_chk "hsz(${2}) show this "; fi
     
     if [ -z "$3" ]; then
     touch                              $hostname/hsz_${2}.othr 
     else
     sudo    hszterm -f /dev/${3} \
     "show this   full"              >  $hostname/hsz_${2}.othr sts=$? 
     if ((0 != $sts)) then   err_chk "hsz(${2}) show other"; fi
     fi
     
     sudo    hszterm -f /dev/${1} \
     "run fmu" "show last most"      >  $hostname/hsz_${2}.errs sts=$? 
     if ((0 != $sts)) then   err_chk "hsz(${2}) run fmu ...";        fi 
     }
     #================================= function diffchk 
     {
     # check two files for differences, based on: diffchk $(ls file.*) 
     #       exit 0  identical
     #       exit 1  differences (if $OUT write $OUT.out; if $OLD mv file) 
     #
     if ((2 != $#)) then
     echo    "Error($#): Incorrect argument count: $1 $2 $3 ..." return  2 
     fi
     if (test ! -z "$UAKDF") then                            
     # UAKDF requested?
     DIFF="uakdf     $UAKDF  $2      $1"
     else
     DIFF="diff              $2      $1"
     fi
     $DIFF
     sts=$?
     if ((0 == $sts)) then
     echo    "Identical, $2 deleted and $1 retained." rm      $2 
     return 0
     fi
     if (test ! -z "$OUT")   then    $DIFF > $OUT.out        ; fi if (test 
     ! -z "$OLD")   then    mv      $1      $OLD    ; fi
     return 1
     }
     ###################################################################### 
     ######### 
     #                                                            
     hszterm.ksh
     
     
     case "$hostname" in                     
     # so where are we?
     
     glacier )
     get_hsz rrz60c  1f      rrz58c  # SW-1  Front   HSZ40 get_hsz rrz28c  
     2f      rrz26c  # SW-2  Front   HSZ40
     
     check " rsh     spike   job/hszterm.ksh"
     check " rcp -p  spike:config/spike/hsz_*        $hostname"
     
     check " rsh     nugget  job/hszterm.ksh"
     check " rcp -p  nugget:config/nugget/hsz_*      $hostname"
     
     check " rcp -p  $hostname/hsz_*         ${sanity}:config/$hostname" 
     ;;
     spike )
     get_hsz rrz17c  1r      rrzd20c # SW-1  Rear    HSZ40
     
     if [  -r $ER_LOG ];   then    exit 1 
     else    exit 0
     fi
     ;;
     nugget )
     get_hsz rrz17c  3n              # SW-3  n/a     HSZ40
     
     if [  -r $ER_LOG ];   then    exit 1 
     else    exit 0
     fi
     ;;
     * )
     echo    "$hostname is not configured in this procedure."        \ 
     >> $ER_LOG
     ;;
     esac
     #--------------------------------------------------------------------- 
     --------- if [[ -r $ER_LOG ]];    then
     echo    "
     Sending mail to: $ALERT "                       >> $ER_LOG
     
     cat     $ER_LOG \
     |       mailx -s        "$hostname hszterm failed"      $ALERT 
     exit    0                       # always exit successfully
     fi
     #--------------------------------------------------------------------- 
     --------- cd      $HOME/config/$hostname          
     # change to our config directory
     
     stamp=hsz_$(date +%y%m%d)
     if [ -e $hostname/*$stamp ]; then       # we've already run once 
     today...
     stamp=hsz_$(date +%y%m%d%H%M%S)
     fi
     
     rm -f   ../out/*hsz*.out        ../out/*hsz*.msg 
     OLD=$HOME/config/old
     unset   UAKDF
     touch   ../out/$stamp.msg
     
     echo    "
     ______________________________________________________________________ 
     ________ Report  hsz show device / unit
     "
     grep    " disk "        hsz_*.devi      >       x.0
     uakce   -m75,84,22                              x.0 -o  x.1 grep    "  
     D"           hsz_*.unit      >       x.0
     uakce   -m58,67,22                              x.0 -o  x.2 sort    
     -k1.5,1.32  -o  hsz.$stamp              x.1     x.2 rm                 
                                  x.*
     OUT=../out/hsz_sum_$stamp
     UAKDF="-c5,32,18 -v"
     diffchk $(ls hsz.*)
     if ((1 == $?))  then
     echo    "
     HSZ     changes:
     ===     ======="                >> ../out/$stamp.msg 
     cat     $OUT.out        >> ../out/$stamp.msg
     fi
     echo    "
     ______________________________________________________________________ 
     ________ Report  hsz fmu show last
     "
     grep -ve 'HSZ>
     for help.
     Copyright'              hsz_*.errs  >   hszerr.$stamp unset   UAKDF 
     unset   OUT
     
     diffchk $(ls hszerr.*)
     if ((1 == $?))  then
     echo    "
     HSZ     errors:
     ===     ======"                 >> ../out/$stamp.msg 
     cat     $(ls hszerr.*)  >> ../out/$stamp.msg
     touch                      ../out/hszerr_$stamp.out 
     fi
     echo    "
     ______________________________________________________________________ 
     ________ Report  hsz show this & show other
     "
     grep -ve 'Time:
     flushed data in cache'          hsz_*.this hsz_*.othr > hszthis.$stamp
     
     unset   UAKDF
     OUT=../out/hszthis_$stamp
     
     diffchk $(ls hszthis.*)
     if ((1 == $?))  then
     echo    "
     HSZ     show_this:
     ===     ========="              >> ../out/$stamp.msg 
     cat     $OUT.out        >> ../out/$stamp.msg
     fi
     
     echo    "
     ______________________________________________________________________ 
     ________ "
     ls      ../out/*hsz*.out
     if ((0 == $?))
     then
     echo    "Sending mail to: $ALERT"
     cat     ../out/$stamp.msg  \
     |       mailx -s "$hostname hsz40 config changes" $ALERT 
     cat     ../out/$stamp.msg
     else
     echo    "There were NO configuration changes found." 
     fi
     ###################################################################### 
     #########
     exit    0                       # always exit successfully
     
     __________________________________________________________________ 
     Kurt Carlson, University of Alaska, (907)474-6266 sxkac_at_alaska.edu
     
Received on Tue Mar 25 1997 - 23:46:11 NZST