HP OpenVMS Systems

ask the wizard

Random process exits?

» close window

The Question is:

 
batch jobs periodically 'stop'. no error messages in logs, no console messages.
 (and yessir, i've poked around the FAQ's for a while)
i have an Alphaserver 4100, VMS 7.1-2, 2.5 G memory, disk-shadowing in use.  on
 a very unpredictable schedule, batch-jobs running DEC-Basic .EXE images
 against Prolog-3 indexed files simply.... stop. no indications of problems are
 detectable. upon re-star
t, the jobs will run to completion normally, and will
run to completion without modification for many more cycles. the system has
 been checked for disk errors, IO contention, process quotas, and
a number of other 'obvious' problems. no luck. this problem has persisted for
 the better part of a two year period. ANY assistance will be heartily
 appreciated. Additional info: the software is 'off the shelf' thirdparty stuff
 that runs quite normally at
other installations;
the support-engineers from this thirdparty provider have tried a number of
 fixes - no good.
i have moved files from one-disk to another (attempt to reduce head contention)
 - no good.
i have implemented a schedule of 'file rebuilds'
using ANALYZE / RMS and CONVERT/FDL on all files
that are involved - no good.
"hopefully awaiting a blow from the magic stick"
thanks in advance.

The Answer is :

 
  That application software runs on one site has relatively little
  bearing  on whether or not the application will run at another
  (and different) site -- site-specific latent application problems
  and site-specific coding dependencies are surprisingly common within
  application code.
 
  For some of the typical programming bugs that can lead to unpredictable
  behaviour, please see topics (1661) and (2681).
 
  As a suggestion, establish a signal handler within the application
  images, and code the handler to report details of any errors.
 
  Compare the PQL parameter settings for the default process quotas.
 
  Check the default mailbox quota parameters.
 
  Check the disk fragmentation levels.
 
  Check the OpenVMS system error log for any RMS bugchecks.
 
  Ensure you have all current mandatory ECOs for OpenVMS applied.
 
  Check the auditing logs for any unexpected use of WORLD privilege,
  and for unexpected use of the $forcex or $delprc system services.
  The $delprc call is used by the DCL command STOP/ID.  (You may well
  have to enable these audits.)
 
  If there is privileged-mode code involved, consider setting the
  parameter BUGCHECKFATAL to cause non-fatal system bugchecks to
  be elevated to fatal OpenVMS system bugchecks -- rather than
  simply having the process terminate, a non-fatal bugcheck will
  then cause the OpenVMS system to crash (and to write a dumpfile).
 

  
     
     answer written or last revised on ( 16-JUL-2001 )
     » close window