HP OpenVMS Systemsask the wizard |
The Question is:
I read with interest your reply in article "(2618) RMS indexed file tuning
and disk cluster factors" and noted especially comments that a disk cluster
factor 50 was "large".
At our installation we commonly work with cluster factors of 1024 upto 8192
blocks. File sizes for some of the most important indexed files range 0.5
million blocks upto one file of 15 million blocks per file.
The disks are mostly RAID 5 sets served by HSZ50 controllers with chunk size
normally 256 blocks.
The file of 15 million blocks is obviously of particular interest so far as
tuning goes. It resides on a raid 5 set, chunk size 256 blocks and cluster
size 768.
We were handed an FDL file similar to the following
FILE
CONTIGUOUS no
GLOBAL_BUFFER_COUNT 10
ORGANIZATION indexed
RECORD
BLOCK_SPAN yes
CARRIAGE_CONTROL carriage_return
FORMAT fixed
SIZE 520
AREA 0
ALLOCATION 10
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 20
EXTENSION 10
AREA 1
ALLOCATION 10
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 5
EXTENSION 10
KEY 0
CHANGES no
DATA_KEY_COMPRESSION yes
DATA_RECORD_COMPRESSION yes
DATA_AREA 0
DATA_FILL 50
DUPLICATES no
INDEX_AREA 1
INDEX_COMPRESSION no
INDEX_FILL 80
LEVEL1_INDEX_AREA 1
NAME ""
NULL_KEY no
PROLOG 3
SEG0_LENGTH 22
SEG0_POSITION 1
TYPE string
After performing an ANALYSE/RMS/FDL on this file, we obtain an FDL similar
to he one listed below
FILE
ALLOCATION 14970624
BEST_TRY_CONTIGUOUS no
BUCKET_SIZE 20
CLUSTER_SIZE 768
CONTIGUOUS no
EXTENSION 65535
FILE_MONITORING no
GLOBAL_BUFFER_COUNT 10
NAME "DISK14:[CABSPROD.DAT.BILLING]CIARH.DAT;119"
ORGANIZATION indexed
OWNER [CABSPROD,DBA_CABSPROD]
PROTECTION (system:RWED, owner:RWED, group:RE, world:)
RECORD
BLOCK_SPAN yes
CARRIAGE_CONTROL carriage_return
FORMAT fixed
SIZE 520
AREA 0
ALLOCATION 14921088
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 20
EXTENSION 65535
AREA 1
ALLOCATION 47872
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 5
EXTENSION 1248
KEY 0
CHANGES no
DATA_KEY_COMPRESSION yes
DATA_RECORD_COMPRESSION yes
DATA_AREA 0
DATA_FILL 80
DUPLICATES no
INDEX_AREA 1
INDEX_COMPRESSION no
INDEX_FILL 80
LEVEL1_INDEX_AREA 1
NAME ""
NULL_KEY no
PROLOG 3
SEG0_LENGTH 22
SEG0_POSITION 1
TYPE string
ANALYSIS_OF_AREA 0
RECLAIMED_SPACE 0
ANALYSIS_OF_AREA 1
RECLAIMED_SPACE 0
ANALYSIS_OF_KEY 0
DATA_FILL 77
DATA_KEY_COMPRESSION 75
DATA_RECORD_COMPRESSION 62
DATA_RECORD_COUNT 29268606
DATA_SPACE_OCCUPIED 14915500
DEPTH 4
INDEX_COMPRESSION 0
INDEX_FILL 78
INDEX_SPACE_OCCUPIED 47305
LEVEL1_RECORD_COUNT 745775
MEAN_DATA_LENGTH 520
MEAN_INDEX_LENGTH 25
When using both of these files as input to an
EDIT/FDL/NOINTERACTIVE/ANALYSE= , the resultant FDL specifies a bucket size
of 63 no matter what I stipulate the cluster factor to be in the input FDL.
Do you think I should use this size bucket or the 20 bloc
ks? Access to this file is mostly by single processes either producing
copies or processing records by index.
Are there any other factors which I should be considering?
We also have a whole series of file between 1 and 3 million blocks which are
indexed and with a supplied FDL which stipulates just one AREA for the file.
The result of the ANAL/RMS/FDL suggests we split it into 2 areas with bucket
sizes of 63 blocks again
. This is a production system where time (and hence performance) is
critical, but where little experimentation is possible so I am reluctant to
suck it and see.
Do you have any advice for us? What areas should we be looking at?
The Answer is : Cluster factors of 1024 are reasonable, particularly when you are dealing with a small number of rather large files. The cluster size in the ANALYZE input will be used by EDIT/FDL, so you will likely have to manually edit the FDL file to ensure you have the necessary control over the bucket size. Your choice of bucket size appears appropriate for this situation. If the application primarily retrieves and reads a record by the index key, then updates and moves on to an other unrelated record, then you will typically want a smaller bucket size, otherwise bandwidth will be wasted transfering unnecessarily large blocks of data. For instance, a bucket size of 63 will cause RMS to transfer 32 kilobytes into and then back out again to update a single (say) 500 byte record. These transfers are obviously questionable extra I/O activity, at best. If adjacent records are processed sequentially, then a larger bucket size can be called for, but with a 20 block bucket size you are already reducing the number of I/O operations to once every 20 records. Once every 63 may not particularly help, may hinder performance for accessing smaller groups of records. When considering other performance factors, also consider the index depth. On the orginal file, the depth is 4. That value is inappropriate. (The proposed FDL bucket size will fix that.) Index depth and bucket size can also be related here -- if a bucket size of 63 is used to get you to an index depth of 63, then there is some incentive to go to larger bucket sizes. As for bucket size, consider a compromise -- consider a bucket size of 32 for this case. This particular value also happens to be a factor of the specificed disk cluster factor. You will be unlikely to be able to measure the effect of two areas. The use of these areas permits RMS to work with multiple bucket sizes within the file, but if both are equal (63) then this capability is obviously not particularly applicable. Multiple areas will also allow you to place each area on independent disks in (for instance) a bound volume set, but very few people go to this trouble. Multiple areas can also allow you to place all 'hot' areas for multiple files close to one another reducing average seek times and put the bulk of the data "out of the way" for occasional access. Again, this capability is infrequently used -- it requires a non-trivial effort to establish the initial layout, as well as detailed knowledge of both the application I/O patterns and the disk behaviour. Global buffers can greatly assist performance of shared files. The current value of 10 is inordinately small for a shared file -- the OpenVMS Wuzard would encourage 200 or more buffers as a test. Global buffers trade off memory use for disk I/O, and this is almost always a performance win. To learn more about the access patterns and the global buffer activity, you can enable file statistics collection and use MONITOR to track the effects of the changes.
|