HP OpenVMS Systems

ask the wizard

RMS indexed file performance after FDL CONVERT?

» close window

The Question is:

 
We have a number of applications whose performance just recently went right
into the dumpster. The common thread seems to be that the files read by
these applications were all recently ANAlyzed and CONVerted using
Ana/RMS/FDL/Nointeractive..., and that al
l have Descending String keys, though the applications may not necessarily
be accessing the files via this descending key. These are, for the most
part, very large files, with one million records or more. There have been no
application changes, and the ap
plications have been running against these files and performance has been
acceptable since 1992.
 
A portion of the descending key in each file in question is the date in
YYYYMM format. Is there something that in the algorithm of reading or
writing descending keys that would suddenly cause unusual overhead or a
poorly optimized file suddenly when the Y
YYYMM = 199810?

The Answer is :

 
    ANAL/RMS followed by EDIT/FDL/NOINTER performance a generic
    file tuning taking a limited number of inputs into account.
    Specifically is looks for the number of data records, the
    average record size and the disk cluster factor.
    It does NOT try to use data key values / distribution.
 
    If your original file was properly designed, taking application
    data and usage patterns into account, then the automated tuning
    is likely to be less efficient. You should try to revive the old
    FDL files and compare the assigned COMPRESSION, FILL FACTOR and
    AREA numbers for KEYs and the BUCKET_SIZE for those areas.
 
    - The key you mention YYYYMM is rather prone to a large number of
    duplicate which in turn can dramatically impact performance.
    Each duplicate will need a 7 bytes pointers. If there are thousands
    (millions?) of duplicates, then you may need an exceptionally large
    bucket size to minimize the pain. (there will be pain!)
 
    - Large files now-adway live on very large disks with sometimes
    largish clustersizes ( > 50 ). This may lead Edit/fdl astray,
    causing it to select overly large buckets. You can force more
    reasonable choices by replacing the actual clustersize in the
    analysis fdl by a 'generic' one like 12. Now rerun edit/fdl and
    use $DIFFERENCE on the old and new output FDLs.
 
    - The old file may have been relativly fragmented with records
    coming and going and random free space for the new records.
    The new file may be tightly packed (FILL FACTOR?) and any new
    record may cause a bucket split: additional IO and time! This
    should become stable after a while, but a re-convert with a
    lower fill factor (70%?)  may be needed.
 
    Good luck!
 
 

  
     
     answer written or last revised on ( 20-NOV-1998 )
     » close window