HP OpenVMS Systems

ask the wizard

RMS file tuning? (Bucket sizes)

» close window

The Question is:

 
I read with interest your reply in article  "(2618) RMS indexed file tuning
and disk cluster factors" and noted especially comments that a disk cluster
factor 50 was "large".
 
At our installation we commonly work with cluster factors of 1024 upto 8192
blocks. File sizes for some of the most important indexed files range 0.5
million blocks upto one file of 15 million blocks per file.
 
The disks are mostly RAID 5 sets served by HSZ50 controllers with chunk size
normally 256 blocks.
 
The file of 15 million blocks is obviously of particular interest so far as
tuning goes. It resides on a raid 5 set, chunk size 256 blocks and cluster
size 768.
 
We were handed an FDL file similar to the following
 
FILE
 
        CONTIGUOUS              no
 
        GLOBAL_BUFFER_COUNT     10
 
        ORGANIZATION            indexed
 
 
 
RECORD
 
        BLOCK_SPAN              yes
 
        CARRIAGE_CONTROL        carriage_return
 
        FORMAT                  fixed
 
        SIZE                    520
 
 
 
AREA 0
 
        ALLOCATION              10
 
        BEST_TRY_CONTIGUOUS     yes
 
        BUCKET_SIZE             20
 
        EXTENSION               10
 
 
 
AREA 1
 
        ALLOCATION              10
 
        BEST_TRY_CONTIGUOUS     yes
 
        BUCKET_SIZE             5
 
        EXTENSION               10
 
 
 
KEY 0
 
        CHANGES                 no
 
        DATA_KEY_COMPRESSION    yes
 
        DATA_RECORD_COMPRESSION yes
 
        DATA_AREA               0
 
        DATA_FILL               50
 
        DUPLICATES              no
 
        INDEX_AREA              1
 
        INDEX_COMPRESSION       no
 
        INDEX_FILL              80
 
        LEVEL1_INDEX_AREA       1
 
        NAME                    ""
 
        NULL_KEY                no
 
        PROLOG                  3
 
        SEG0_LENGTH             22
 
        SEG0_POSITION           1
 
        TYPE                    string
 
After performing an ANALYSE/RMS/FDL on this file, we obtain an FDL similar
to he one listed below
 
 
 
FILE
 
        ALLOCATION              14970624
 
        BEST_TRY_CONTIGUOUS     no
 
        BUCKET_SIZE             20
 
        CLUSTER_SIZE            768
 
        CONTIGUOUS              no
 
        EXTENSION               65535
 
        FILE_MONITORING         no
 
        GLOBAL_BUFFER_COUNT     10
 
        NAME                    "DISK14:[CABSPROD.DAT.BILLING]CIARH.DAT;119"
 
        ORGANIZATION            indexed
 
        OWNER                   [CABSPROD,DBA_CABSPROD]
 
        PROTECTION              (system:RWED, owner:RWED, group:RE, world:)
 
 
 
RECORD
 
        BLOCK_SPAN              yes
 
        CARRIAGE_CONTROL        carriage_return
 
        FORMAT                  fixed
 
        SIZE                    520
 
 
 
AREA 0
 
        ALLOCATION              14921088
 
        BEST_TRY_CONTIGUOUS     yes
 
        BUCKET_SIZE             20
 
        EXTENSION               65535
 
 
 
AREA 1
 
        ALLOCATION              47872
 
        BEST_TRY_CONTIGUOUS     yes
 
        BUCKET_SIZE             5
 
        EXTENSION               1248
 
 
 
KEY 0
 
        CHANGES                 no
 
        DATA_KEY_COMPRESSION    yes
 
        DATA_RECORD_COMPRESSION yes
 
        DATA_AREA               0
 
        DATA_FILL               80
 
        DUPLICATES              no
 
        INDEX_AREA              1
 
        INDEX_COMPRESSION       no
 
        INDEX_FILL              80
 
        LEVEL1_INDEX_AREA       1
 
        NAME                    ""
 
        NULL_KEY                no
 
        PROLOG                  3
 
        SEG0_LENGTH             22
 
        SEG0_POSITION           1
 
        TYPE                    string
 
 
 
ANALYSIS_OF_AREA 0
 
        RECLAIMED_SPACE         0
 
 
 
ANALYSIS_OF_AREA 1
 
        RECLAIMED_SPACE         0
 
 
 
ANALYSIS_OF_KEY 0
 
        DATA_FILL               77
 
        DATA_KEY_COMPRESSION    75
 
        DATA_RECORD_COMPRESSION 62
 
        DATA_RECORD_COUNT       29268606
 
        DATA_SPACE_OCCUPIED     14915500
 
        DEPTH                   4
 
        INDEX_COMPRESSION       0
 
        INDEX_FILL              78
 
        INDEX_SPACE_OCCUPIED    47305
 
        LEVEL1_RECORD_COUNT     745775
 
        MEAN_DATA_LENGTH        520
 
        MEAN_INDEX_LENGTH       25
 
When using both of these files as input to an
EDIT/FDL/NOINTERACTIVE/ANALYSE= , the resultant FDL specifies a bucket size
of 63 no matter what I stipulate the cluster factor to be in the input FDL.
Do you think I should use this size bucket or the 20 bloc
ks? Access to this file is mostly by single processes either producing
copies or processing records by index.
 
Are there any other factors which I should be considering?
 
We also have a whole series of file between 1 and 3 million blocks which are
indexed and with a supplied FDL which stipulates just one AREA for the file.
The result of the ANAL/RMS/FDL suggests we split it into 2 areas with bucket
sizes of 63 blocks again
. This is a production system where time (and hence performance) is
critical, but where little experimentation is possible so I am reluctant to
suck it and see.
 
Do you have any advice for us? What areas should we be looking at?

The Answer is :

 
  Cluster factors of 1024 are reasonable, particularly when you are
  dealing with a small number of rather large files.
 
  The cluster size in the ANALYZE input will be used by EDIT/FDL,
  so you will likely have to manually edit the FDL file to ensure you
  have the necessary control over the bucket size.
 
  Your choice of bucket size appears appropriate for this situation.
 
  If the application primarily retrieves and reads a record by the index
  key, then updates and moves on to an other unrelated record, then you
  will typically want a smaller bucket size, otherwise bandwidth will be
  wasted transfering unnecessarily large blocks of data.  For instance,
  a bucket size of 63 will cause RMS to transfer 32 kilobytes into and
  then back out again to update a single (say) 500 byte record.  These
  transfers are obviously questionable extra I/O activity, at best.
 
  If adjacent records are processed sequentially, then a larger bucket
  size can be called for, but with a 20 block bucket size you are already
  reducing the number of I/O operations to once every 20 records.  Once
  every 63 may not particularly help, may hinder performance for accessing
  smaller groups of records.
 
  When considering other performance factors, also consider the index depth.
  On the orginal file, the depth is 4.  That value is inappropriate.  (The
  proposed FDL bucket size will fix that.)  Index depth and bucket size can
  also be related here -- if a bucket size of 63 is used to get you to an
  index depth of 63, then there is some incentive to go to larger bucket
  sizes.
 
  As for bucket size, consider a compromise -- consider a bucket size of
  32 for this case.  This particular value also happens to be a factor of
  the specificed disk cluster factor.
 
  You will be unlikely to be able to measure the effect of two areas.
  The use of these areas permits RMS to work with multiple bucket sizes
  within the file, but if both are equal (63) then this capability is
  obviously not particularly applicable.  Multiple areas will also allow
  you to place each area on independent disks in (for instance) a bound
  volume set, but very few people go to this trouble.  Multiple areas
  can also allow you to place all 'hot' areas for multiple files close
  to one another reducing average seek times and put the bulk of the
  data "out of the way" for occasional access.  Again, this capability
  is infrequently used -- it requires a non-trivial effort to establish
  the initial layout, as well as detailed knowledge of both the application
  I/O patterns and the disk behaviour.
 
  Global buffers can greatly assist performance of shared files.  The
  current value of 10 is inordinately small for a shared file -- the
  OpenVMS Wuzard would encourage 200 or more buffers as a test.  Global
  buffers trade off memory use for disk I/O, and this is almost always
  a performance win.  To learn more about the access patterns and the
  global buffer activity, you can enable file statistics collection
  and use MONITOR to track the effects of the changes.
 

  
     
     answer written or last revised on ( 7-OCT-1999 )
     » close window