HP OpenVMS Systems

ask the wizard

Diagnosing application (or system) loop?

» close window

The Question is:

 
This is a rather complex scenario, and involves an old, donated VAX
6000-440, VMS V6.0 I have an casual involvement in (.edu, no maintenance,
and so this problem, tracked over many months, cannot be addressed by CSC).
Mass storage is provided by HSC70 se
rved disks.  There is a VAXstation 4000/90 making up the small cluster.  I
do not have physical access to this system which make crashing it for dump
information during problem periods diffcult.
 
My application, written in C, accessed by multiple, concurrent clients, uses
user-mode AST-driven RMS to create, populate and delete files and ACP QIOs
to read and modify file revision date/times.
 
Every now and then the system goes ($ MON MODES) 70% kernel mode, 15-20%
interrupt, with the application process showing 100% of one CPU (presumably
being billed for the 70% kernel, as this is basically the only CPU activity
on the system).  File cache ($
 MON FILE) show non-normal levels of Dir FCB and Dir Data attempt rates
(140), with no actual disk activity ($ MON DISK), lots (300) of LOCK
ENQs/DEQs ($ MON LOCK), correspondingly non-normal amounts of DLOCK
ENQs/DEQs and Dir Function Out activity , lots
 (140) of buffered I/O.
 
A $ SHOW PROC/CONT on the application show considerable periods spent at
priority 16 and attempted in tracking of the PC shows much time spent around
8D0xxxxx, much less time at 7Fxxxxxx, and I cannot find anywhere it might be
in my user code (0xxxxxxx).
 
 
I have applied a number of relevant 6.0 patches (from the public FTP site),
shadowing, C RTL, F11, LIBR, etc. without mitigating the problem.  Faster
CPUs added mid-year seem to have exacerbated the problem.  I have also
increased the size of the SYSGEN A
CP_xxx cache sizes have just disabled the virtual block cache experimentally
(impact still to be determined), but am running out of ideas.
 
Can the Wizard suggest anything likely?  Thanks (and sorry for the novella).

The Answer is :

 
  Beyond providing pointers to the previous discussions of correctly
  synchronizing applications and memory accesses (topics 1661 and 2681),
  there is far too little detail included here to even begin to diagnose
  the cause of this problem.  (In particular, migrations to faster CPUs,
  between VAX and Alpha systems, and moves from uniprocessor to SMP
  configurations does tend to expose latent synchronization flags in
  application code.)
 
  Various system service entry points are located in the 7Fxxxxxx range.
  For details on these entrypoints, please see the OpenVMS VAX system map
  file (SYS$SYSTEM:SYS.MAP) and see the SYS$P1_VECTOR module in the
  STARLET.OLB module.  Through the use of the techniques discussed in
  topics 1661 and 2681 and the debugger, as well as the system map, it
  may be possible to locate the particular application trigger for this
  problem -- of course the problem may be due to a flaw in OpenVMS, but
  the flaw is equally (or more) likely to reside in the application code.
 
  OpenVMS VAX V6.0 is no longer supported and ECOs are no longer being
  generated for this release, and an upgrade to at least V6.2 is strongly
  recommended -- as you are at OpenVMS VAX V6.0, an upgrade to V7.2 is
  generally a simple task, and a task that does not normally have any
  particular application-mode or kernel-mode implications.  (The upgrade
  from OpenVMS VAX V5.x to V6.0 is a major upgrade with various implications
  for kernel-mode code and kernel-mode applications.  The V6.x to V7.0
  upgrade was a major upgrade only on the OpenVMS Alpha platform, not
  on the OpenVMS VAX platform.)
 

  
     
     answer written or last revised on ( 18-NOV-1999 )
     » close window