Previous | Contents | Index |
This chapter describes Compaq KAP C command-line switches that allow
you to alter KAP defaults.
4.1 Overview of Command-Line Switches
You will frequently be satisfied with the default switch settings of Compaq KAP C. However, you can alter default settings to customize optimizations for a given application program and machine. These alterations include limiting the search space for loop optimization, adjusting the parameters that describe cache memory, and enabling or disabling classes of transformations.
To specify a command-line switch, you can use the long name or short name. If a command-line switch appears more than once on the command line, the last value given is used. Multiple occurrences of an input/output file selection switch are not allowed.
The short names for switches are provided as a convenience, especially for interactive users. However, the short names may not remain unique from one version of KAP to another. Use the long names in situations that require long-term compatibility, such as a canned shell script. |
Table 4_1 and Table 4-2 list the command-line switches for the kcc driver and the kapc preprocessor. The first column lists the long name of each switch as well as the functional categories of switches, such as general optimization, parallel processing, and so forth. The next two columns list the short name and default value of each switch. Switches that have different argument syntax in their regular and negative (no) forms are shown on two lines.
File names are case sensitive on Tru64 UNIX systems, so file-name parameters must match the names of the files wanted. A hyphen (-) is required before each switch listed in the following tables, but the hyphen is not shown in the tables. |
Long Name | Related Switch | Short Name | Default Value |
---|---|---|---|
cc=<C_compiler_path> | cc=/usr/bin/cc | ||
cext=<C file extension> | cext=c | ||
ckap=<path to kapc> | ckap=/usr/bin/kapc | ||
ckapargs=<kap_switch_string> | |||
cpp=<cpp_path> | cpp=/usr/bin/cc | ||
sif=<cpp, kap>,-S | off | ||
tmpdir=<temporary_directory_path> | tmpdir=/tmp/ | ||
tune=<architecture> | tune=<current system architecture> | ||
verbose | v | nov |
Long Name | Related Switch | Short Name | Default Value |
---|---|---|---|
General Optimization | |||
[no]interchange | interchange | ||
namepartitioning=
<integer>,<integer> |
so |
namepart=
<integer>,<integer> |
nonamepartitioning |
natural | nat | natural | |
optimize=<integer> | o=<integer> | optimize=5 | |
[no]recursion | [n]rc | rc | |
roundoff=<integer> | o, so | r=<integer> | roundoff=3 |
scalaropt=<integer> | r | so=<integer> | scalaropt=3 |
skip | sk | nosk | |
tune=<architecture> | tune=<current system architecture> | ||
Parallel Processing | |||
chunk | scheduling | chunk=1 | |
[no]concurrentize | [n]conc | noconcurrentize | |
minconcurrent=<integer> | mc | minconcurrent=1000 | |
scheduling=<list> | sched=<list> | scheduling=e | |
Inlining and IPA | |||
inline[=<names>] | inl[=<names>] | off | |
noinline[=<names>] | ninl[=<names>] | ||
ipa[=<names>] | ipa[=<names>] | off | |
noipa[=<names>] | nipa[=<names>] | ||
inline_and_copy=<names> | inlc=<names> | off | |
inline_create=<file> | incr=<file> | off | |
ipa_create=<file> | ipacr=<file> | off | |
inline_depth=<integer> | ind=<integer> | ind=2 | |
ipa_depth=<integer> | ipad=<integer> | ipad=10 | |
inline_from_files=<file>,<file> | inl | inff=<file>,<file> | current source file |
ipa_from_files=<file>,<file> | ipa | ipaff=<file>,<file> | current source file |
inline_from_libraries=<library>,
<library> |
inl |
infl=<library>,
<library> |
off |
ipa_from_libraries=<library>,<library> | ipa |
ipafl=<library>,
<library> |
off |
inline_looplevel=<integer> | inll=<integer> | inll=2 | |
ipa_looplevel=<integer> | ipall=<integer> | ipall=2 | |
inline_manual | inm | off | |
ipa_manual | ipam | off | |
inline_optimize=<integer> | inline_optimize=0 | ||
ipa_optimize=<integer> | ipa_optimize=0 | ||
Input-Output File Selection | |||
cmp[=<file>] | cmp[=<file>] | See Section 4.6.1 | |
nocmp | ncmp | ||
list[=<file>] | l[=<file>] | See Section 4.6.2 | |
nolist | nl | ||
Listing | |||
cmpoptions[=<list>] | cp[=<list>] | ncp | |
nocmpoptions | ncp | ||
lines=<integer> | ln=<integer> | ln=55 | |
listingwidth=<integer> | lw=<integer> | lw=132 | |
listoptions=<list> | lo=<list> | see Section 4.7.4 | |
suppress=<list> | su=<list> | off; see Section 4.7.5 | |
Language | |||
[no]restrict | restrict | ||
signed | See Section 4.8.2 | ||
Advanced Optimization | |||
addressresolution=<integer> | so, r | arl=<integer> | arl=1 |
[no]arclimit=<integer> | so, r | arclm=<integer> | arclm=5000 |
cache_prefetch_line_count=
<integer> |
cplc=<integer> | cplc=0 | |
cacheline=<integer>[,<integer>] |
chl=<integer>
[,<integer>] |
chl=64,64 | |
cachesize=<integer>[,<integer>] |
chs=<integer>
[,<integer>] |
chs=32,0 | |
dpregisters=<integer> | dpr=<integer> | dpr=32 | |
each_invariant_if_growth=<integer> | so, r, miifg | eiifg=<integer> | eiifg=20 |
fpregisters=<integer> | fpr=<integer> | fpr=32 | |
[no]fuse | so,o | [n]fuse | nofuse |
fuselevel=<integer> | fuse | fuselevel=0 | |
heaplimit=<integer> | heap=<integer> | heaplimit=100 | |
hoist_loop_invariants=<integer> | so, r | hli=<integer> | hli=1 |
limit=<integer> | lm=<integer> | lm=50 | |
machine=<list> | so, r | ma=<list> | ma=s |
max_invariant_if_growth=<integer> | so, r, eiifg | miifg=<integer> | miifg=500 |
routine=<rtn_name><switches> |
rt=<rtn_name>
<switches> |
off | |
setassociativity=
<integer>[,<integer>] |
so, r |
sasc=
<integer>[,<integer>] |
sasc=1,1 |
[no]stdio | so, r | nostdio | |
[no]syntax | sy=<value> | nosyntax | |
tablesize=<integer> | ts=<integer> | ts=24000000 | |
unroll=<integer> | so, r | ur=<integer> | ur=4 |
unroll2=<integer> | so, r | ur2=<integer> | ur2=160 |
unroll3=<integer> | so, r | ur3=<integer> | ur3=1 |
The following sections explain the function of each
kcc
driver switch.
4.2.1 -cc, -nocc, (-cc=/usr/bin/cc)
This switch provides an alternate path to the C compiler or inhibits
execution of the C compiler.
4.2.2 -cext, (C file extension)
This switch tells
kapc
to treat files with the indicated extension as C source files.
4.2.3 -ckap, (-ckap='/usr/bin/kapc')
This switch provides a way to define an alternate path
kapc
preprocessor (translator).
4.2.4 -ckapargs
The
-ckapargs
switch passes switches to the
kapc
translator. This switch must precede switches to the
kapc
translator.
4.2.5 -cpp, (-cpp='/usr/bin/cc')
This switch provides a way to define an alternate path to the C
preprocessor before execution of
kapc
.
4.2.6 -sif, -S, (off)
Save intermediate files. Specifying -sif is equivalent to -sif=cpp,kap , which will save all kapc and C preprocessor intermediate files. Specifying -S is equivalent to -sif=kap and passing -S to the compiler, which saves the assembly-language output. Intermediate file-naming conventions are as follows:
<file>.cpp
- cpp output file
K<file>.c
- kapc translator output file
The path and switch strings shown above must be enclosed in single or
double quotes if they contain white space characters.
4.2.7 -tmpdir, (-tmpdir=/tmp/)
This is the directory to place temporary files. This switch may also be
set by the environment variable TMPDIR.
4.2.8 -tune, (-tune=<current system architecture>)
KAP determines whether the host Alpha architecture is ev4 , ev5 , or ev6 and then optimizes your program for that architecture by default. In the event you compile a program on one architecture but plan to run it on another, you should override the default by setting -tune equal to the architecture of the target system.
The KAP -tune switch and the C compiler -tune host switch work independently and perform different optimizations. If the switch appears on the command line inside -ckapargs='-tune...' , for example:
> kcc myprog.c -ckapargs='-tune=ev6' |
the switch value will be applied only to the KAP translator. However, in the case:
> kcc myprog.c -tune=ev6 |
the switch will be applied to both KAP and the C compiler.
4.2.9 -verbose, -v, (-nov)
Prints the passes as they execute with their arguments and their input
and output files. Also prints final resource usage in the C-shell time
format.
4.3 General Optimization Switches for the kapc Preprocessor
The following sections explain the function of each
kapc
general optimization switch.
4.3.1 -interchange, -nointerchange, (-interchange)
Use the -interchange switch to enable loop interchanging.
KAP enables loop interchange when -interchange is specified and the -optimize level is at least 1 or the -scalaropt level is 3.
If you specify
-nointerchange
, KAP disables loop interchange regardless of the
-optimize
or
-scalaropt
levels.
4.3.2 -namepartitioning, -namepart, -nonamepart, (-nonamepartitioning)
The -namepartitioning switch tells KAP to look at distinct array names and limit the number of arrays that appear in a loop to avoid cache thrashing. That is, this switch breaks a loop containing, for example, references to arrays A and B into two loops. One loop references array A and the other loop references array B.
Two arguments ( i and j ) used in a -namepartitioning=i,j switch, control name partitioning as follows:
If no arguments appear with the -namepartitioning switch, KAP uses its default values of 2 for the minimum and 8 for the maximum number of partitions.
Before KAP can perform name partitioning, you must specify the switch -scalaropt=n where n is greater than or equal to 3.
The
-nonamepartitioning
switch explicitly prevents name partitioning.
4.3.3 -natural, -nat, -nonatural, -nnat, (-natural)
The -natural switch selects "natural" alignment (for example, double entities start on eight-byte boundaries) instead of non-alignment of data elements.
The
-natural
switch causes variables and arrays to start on boundaries that
correspond to their size.
4.3.4 -optimize, -o, (-o=5)
The -optimize switch sets the optimization level, ranging from the least aggressive optimization of 0 to the most aggressive of 5.
Each optimization level is cumulative. For example, -optimize=5 performs everything up to and including that level. Table 4-3 shows the meaning of each of the different optimization levels.
Value | Meaning |
---|---|
0 | KAP performs only simple program analysis. No loop optimization is performed. |
1 | KAP performs simple loop optimization. KAP can distribute loops to optimize only a part of a loop. |
2 | KAP optimizes any loop (and perhaps nested loops) in a loop nest. It performs lifetime analysis to determine when last-value assignment of scalars is necessary. It performs more powerful data dependence tests to find opportunities for optimization. |
3 | Special techniques are used to break data dependence cycles that otherwise prevent advanced optimizations. Triangular loops are recognized and loop interchanging is performed to improve memory referencing. Special-case data dependence tests are used. |
4 | Two versions of a loop are generated, if necessary, to break a data dependence arc. Exact data dependence tests are used to allow more opportunities for optimization to be discovered. Special index sets, called wraparound variables, are recognized. |
5 | Array expansion and loop fusion are enabled. |
A higher optimization level allows more sophisticated optimization,
along with increased compilation time. Many programs that are written
to be easily optimized do not need advanced transformations; with these
programs, a lower optimization level will suffice.
4.3.5 -recursion, -rc, -nrc, (-norecursion)
The -recursion switch informs KAP that functions in the source program may be called recursively. (That is, the function calls itself, or it calls another routine that calls it.)
The
-recursion
switch must be in force in each recursive routine that KAP processes,
or unsafe transformations could result.
4.3.6 -roundoff, -r, (-r=3)
The -roundoff switch lets you specify the level of acceptable roundoff errors.
If an arithmetic reduction is accumulated in a different order than in the scalar program, the roundoff error is accumulated differently and the final result may differ from that of the original program. The difference is usually insignificant, but some restructuring transformations performed by KAP must be disabled in order to obtain exactly the same answers as the scalar program.
KAP classifies its transformations by the amount of difference in roundoff that can accumulate, so you can decide what level of roundoff error differences is allowable.
Each nonzero roundoff level is cumulative. For example, level 3 performs everything up to and including that level. Table 4-4 shows the meaning of each roundoff level.
Value | Meaning |
---|---|
0 | No roundoff-changing transformations are allowed. Loops containing nonarithmetic reductions (such as the largest element of a vector) may still be optimized. |
1 | Loop interchanging around serial reductions is allowed if -optimize=4 . Simplification of expressions from forward substitution or inside trigonometric intrinsic functions returning integer values is performed. Code floating is enabled if -scalaropt is greater than or equal to 2. Loop rerolling is enabled if -scalaropt is greater than or equal to 2. |
2 | Reciprocal substitution is performed to move an expensive division outside a loop. |
3 | Floating-point (float or double) induction variables are recognized. Memory management is enabled if -scalaropt=3 . Expressions such as A / B / C can be rotated to A / (B * C). |
The -scalaropt switch sets the level of scalar optimizations that KAP performs. These scalar optimizations include dusty-deck transformations, dead-code elimination, and loop unrolling.
Table 4-5 shows the value and meaning of scalar levels.
Value | Meaning |
---|---|
0 | No scalar optimizations are performed. |
1 | Only simple scalar optimizations are performed. These include dead-code elimination, global forward substitution, and dusty-deck IF transformations. |
2 | The full range of scalar optimization is performed. These include floating invariant IFs out of loops, induction variable recognition, loop rerolling if -roundoff is greater than or equal to 1, loop peeling, loop fusion, and loop unrolling. |
3 | Memory management is enabled if -roundoff=3 . |
Unlike the
-scalaropt
switch, the
#pragma _KAP scalaropt
directive sets the level of loop-based optimizations only, such as
unrolling, and not optimizations such as dead code elimination.
4.3.8 -skip, -sk, -nsk, (-noskip)
The
-skip
switch tells KAP to ignore application of optimizing transformations
for all routines within the input file. If you want to be selective in
terms of which routines are not optimized, see the description of the
-routine
switch in Section 4.9.16, -routine=ƒrtn_name„ƒswitches„, -rt=ƒrtn_name„ƒswitches„, (off).
4.3.9 -tune, (-tune=<current system architecture>)
kapc determines whether the host Alpha architecture is ev4 , ev5 , or ev6 and then optimizes your program for that architecture by default. In the event you compile a program on one architecture but plan to run it on another, you should override the default by setting -tune equal to the architecture of the target system.
The kapc -tune switch and the C compiler -tune host switch work independently and perform different optimizations. If the switch appears on the command line inside -ckapargs='-tune...' , for example:
> kcc myprog.c -ckapargs='-tune=ev6' |
the switch value will be applied only to the kapc translator. However, in the case:
> kcc myprog.c -tune=ev6 |
the switch will be applied to both
kapc
and the C compiler.
4.4 Parallel Processing Switches for the kapc Preprocessor
The following sections describe the
kapc
switches you use to control how the multiprocessor version of KAP
prepares programs for parallel execution.
4.4.1 -chunk, (-chunk=1)
The
-chunk
switch modifies, and is used only with, the
-scheduling
switch. The
-chunk
switch determines the number of loop iterations that are in a group.
4.4.2 -concurrentize, -conc, -noconcurrentize, (-nconc)
The -concurrentize switch directs KAP to restructure the source code for parallel processing. You can enable or disable parallel execution on a file-by-file basis using KAP pragmas. See Section 5.2, Parallel Processing Assertions for more information.
Parallel execution will disable certain serial optimizations. Programs containing many loops that require synchronization or programs that have loops with small iteration counts might run more slowly when parallelized. In these cases, you should disable parallel execution.
Setting
-noconcurrentize
disables parallel execution and allows all serial optimizations to take
place.
4.4.3 -minconcurrent, -mc, (-mc=1000)
The -minconcurrent switch sets the level of work in a loop above which KAP executes the loop in parallel. The range of values for this switch is all numbers greater than or equal to 0. The higher the minconcurrent value, the more iterations and/or statements the loop body must have to run in parallel.
Executing a loop in parallel incurs overhead that varies with different systems. If a loop has little work, the overhead required to set up parallel execution might make the loop execute more slowly than it would using serial execution.
KAP estimates the amount of work inside a loop by adding the number of operators and the number of operands, excluding the loop index, in each iteration. KAP multiplies this sum by the number of iterations and designates this product as the amount of "work" of the loop. KAP then compares this estimate with the -minconcurrent value. If the loop bounds are constant and the estimated amount of work is greater than the -minconcurrent value, KAP generates parallel code for the loop. Otherwise, the loop executes serially.
If the for loop bounds are not known at compilation time, KAP generates an if expression in the parallel pragma. The compiler interprets this parallel pragma as a request to generate a two-version loop; one version is parallel and the other is serial. A run-time check decides whether or not to execute the loop in parallel. To disable the generation of two-version loops throughout a program, use the command-line switch -minconcurrent=0 .
Setting the
-minconcurrent
switch automatically sets the
-concurrentize
switch.
4.4.4 -scheduling, -sched, (-sched=e)
The -scheduling switch tells KAP the kind of scheduling to use for loop iterations on a multiprocessor system.
The options are:
The following sections explain the function of each kapc switch used in function inlining and interprocedural analysis (IPA). Inlining is the process of replacing a function reference with the text of the function. IPA is the process of inspecting a called function to identify relationships between the function arguments, the function returned value, global data, and the code surrounding the call, in order to identify opportunities for optimization.
Inlining and IPA can be performed in the same KAP run. The only
restriction is that the same function may not be in global lists for
both inlining and IPA. You can use the inline and IPA pragmas to inline
a function in one place and IPA it in another. For additional
information about these switches and examples of their use, see
Chapter 5 and Chapter 6.
4.5.1 -inline, -inl, -noinline, (-ninl), -ipa, -noipa, (-nipa)
The -inline switch provides KAP with a list of functions to inline. The -ipa switch provides KAP with a list of functions to analyze. Additionally, -ipa causes KAP to give information in the annotated listing about appropriate settings for the -ind , -inll , and -ipall switches on a loop-by-loop basis.
If you specify either the -inline or the -ipa switch without an argument list, KAP will try to inline/analyze all the called functions in the inlining (or IPA) universe specified by the -inline_from... -ipa_from... switches. If you specify a list of routine names, for example -inline=mkcoef,yval , just the routines named are inlined or analyzed.
The
-inline
and
-ipa
command-line switches are overridden by the
#pragma _KAP inline
and
#pragma _KAP ipa
directives. See Chapter 5, Assertions and Directives
and Chapter 6, Inlining and Interprocedural Analysis (IPA)
for more information
about these pragmas.
A list of routines must be included with
-noinline
or
-noipa
. All routines in the inlining/IPA universe are candidates for inlining
except the listed ones. See Chapter 6 for more information.
4.5.2 -inline_and_copy, -inlc, (off)
The -inline_and_copy switch functions like the -inline switch, except that if all CALLS and references to a subprogram are inlined, the text of the routine is not optimized but is copied unchanged to the transformed code file. This switch is intended for use when you are inlining routines from the same file as the call, and has no special effect when the routines being inlined are taken from a library or another source file.
When a subprogram has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized routine will still be available in case one of the other source files contains a reference to it.
The -inline_and_copy algorithm assumes that all CALLs and references to the routine precede it in the source file. If the routine is referenced after the text of the routine and if that particular call site cannot be inlined, the unoptimized version of the routine will be invoked. |
These switches cause KAP to build a library file containing partially analyzed routines for later inlining/analysis. The library created is used with the -inline_from_libraries and -ipa_from_libraries switches.
When you specify either of these switches, no transformed code file is generated.
Libraries created with -inline_create can be used with either inlining or IPA, because they contain essentially complete descriptions of the functions included. Libraries created with -ipa_create can be used only with IPA, because they do not have the complete text of the functions, just the data relationship information.
You can use any name for the created library. However, for maximum
compatibility with the
-inline_from_libraries
and
-ipa_from_libraries
switches, Compaq recommends that you use the
.klib
extension.
4.5.4 -inline_depth, -ind, (-ind=2), -ipa_depth, -ipad, (-ipad=10)
The -inline_depth and -ipa_depth switches set the maximum level of function nesting, that is, calls to functions with calls to functions and so forth, that KAP will attempt to inline or analyze. Higher switch values cause KAP to trace CALLs and function references further. The values and their meanings are:
The
#pragma _KAP [no]inline
and
#pragma _KAP [no]ipa
directives are not affected by
-inline_depth
or
-ipa_depth
restrictions.
4.5.5 -inline_from_files, -inff, (current source file)
The -..._from_... switches provide KAP with the locations of functions available for inlining/IPA. The total set of available functions is called the inlining (or IPA) universe.
The -..._from_files switches take the names of source files and directories containing source files. Including a directory, for example, -ipaff=/usr/ipalib , is equivalent to the UNIX notation /usr/ipalib/*.c . Do not use shell wildcard characters in the list of files and directories.
The -..._from_libraries switches take the names of libraries created with the -..._create switches and directories containing such libraries. In directories, the KAP libraries are identified by the .klib extension.
Multiple files/libraries or directories can be given in one -..._from_... switch, separated by commas and enclosed by parentheses. Multiple -..._from_... switches can be specified on the command line.
KAP searches for functions in the provided files and libraries in the
order in which they appear on the command line.
4.5.6 -inline_from_libraries, -infl, (off)
See Section 4.5.5, -inline_from_files, -inff, (current source file).
4.5.7 -ipa_from_files, -ipaff, (current source file)
See Section 4.5.5, -inline_from_files, -inff, (current source file).
4.5.8 -ipa_from_libraries, -ipafl, (off)
See Section 4.5.5, -inline_from_files, -inff, (current source file).
4.5.9 -inline_looplevel, -inll, (-inll=2), -ipa_looplevel, -ipall, (-ipall=2)
The -..._looplevel switches enable you to limit inlining to just functions that are referenced in nested loops, where the effects of reduced function call overhead or enhanced optimizations will be multiplied.
The parameter is defined from the most deeply nested function reference. The -inll=1 switch restricts inlining to functions referenced in the deepest loop nest. The -inll=3 switch restricts inlining to those routines referenced at the three deepest levels. The for loop nest level of each function reference is included in the optional calling tree section of the listing file.
The
#pragma _KAP [no]inline
and
#pragma _KAP [no]ipa
directives, when enabled, are not affected by the looplevel
restrictions.
4.5.10 -inline_manual, -inm, (off) -ipa_manual, -ipam, (off)
These switches cause KAP to recognize the #pragma _KAP [no]inline and #pragma _KAP [no]ipa directives. This allows manual control over which functions are inlined/analyzed at specific call sites.
The default is to ignore these pragmas. When any inlining or IPA switch is included on the command line, the inline or ipa pragmas, respectively, are enabled. The -inline_manual and -ipa_manual switches are provided so the pragmas can be enabled without activating the automatic inlining or IPA algorithms. Because #pragma _KAP [no]inline and #pragma _KAP [no]ipa are not otherwise affected by the -inline=, -ipa=, -inline_depth, and -.._looplevel command-line switches, you can use them with command-line control to select functions or call sites that the regular selection algorithm would reject.
See Chapter 5, Assertions and Directives and Chapter 6, Inlining and Interprocedural Analysis (IPA) for more information about the
inline
and
ipa
pragmas.
4.5.11 -inline_optimize, (-inline_optimize=0), -ipa_optimize, (-ipa_optimize=0)
The switches -inline_optimize and -ipa_optimize help you to optimize large programs by causing KAP to set other switches depending on the value you specify. The values and meanings are:
The following sections explain the function of each
kapc
switch that affects KAP input-output file selection.
4.6.1 -cmp, -nocmp, -ncmp, (<file>.cmp.c), (<file>.cmp)
The -cmp=<file> switch lets you assign a different file name for the optimized C program.
The Compaq C compiler will only process files with the extension .c. Thus, you should not override the default by using any other extension. Note that the kcc command will create the default name <file>.cmp.c while explicit user invocation of the kapc command will create the default name <file>.cmp.
The optimized source file is placed in the current directory.
To disable generation of the optimized C output file, enter
-nocmp
on the command line.
4.6.2 -list, -l, -nolist, -nl, (-list=<file>.out)
The -list=<filename> switch provides a way to name the generated annotated listing file.
Specifying -list with no file name will cause the listing file to be written to <file>.out , where <file> is the input file name with any trailing .c stripped off. For example, if the input file is myprog.c , the output file would be myprog.out .
To disable generation of the listing file, enter
-nolist
on the command line.
4.7 Listing Switches for the kapc Preprocessor
The following sections explain the function of each kapc switch concerning the listing file or the optional listing information available in the transformed code file.
The transformed code is recorded in the transformed code file regardless of whether you request a listing file.
See Chapter 8 for examples of the different types of KAP listing
output.
4.7.1 -cmpoptions, -cp, -nocmpoptions, (-ncp)
The -cmpoptions switch specifies optional additional information for inclusion in the transformed output file. The only additional information currently selectable is special line-number comments. These are enabled with -cmpoptions=i , which inserts special numbers that reference original code.
Special line numbers are # line directives that may appear in the transformed program file in order to reference line numbers of the source code. The line in the transformed code that immediately follows a # line comment is either the transformed version of the line in the source code that is referenced, or a line that KAP inserted before the referenced line. The name of the source file from the command line is included, in the form it had on the KAP command line.
In the following unrolled loop, the for in the source code was on line 7 , and the assignment was on line 8 :
# line 7 "-./csource/unr5.c" for ( i = i1 + 1; i<=n; i+=3 ) { a[i] = b[i] / a[i-1]; # line 8 "-./csource/unr5.c" a[i+1] = b[i+1] / a[i]; # line 8 "-./csource/unr5.c" a[i+2] = b[i+2] / a[i+1]; # line 8 "-./csource/unr5.c" } |
The -lines switch paginates and sets the number of lines per page for printing.
The
-lines=0
switch tells KAP to paginate only at subroutine boundaries.
4.7.3 -listingwidth, -lw, (-lw=132)
The -listingwidth switch sets the maximum line length for the listing file.
This switch setting affects the format of the loop summary table, which is printed with the -listoptions=l switch, and the KAP switches table ( -lo=k ).
The default value, 132, is optimal for most line printers. The
alternative, 80, is more convenient for looking at the listing file on
most terminals. No other values are allowed.
4.7.4 -listoptions, -lo, (off)
The -listoptions switch tells KAP what information to include in the listing files:
Value | Prints |
---|---|
c | Calling tree at the end of the program listing |
k | KAP switches active within the program unit |
l | Loop-by-loop optimization table |
n | Program unit names, as processed, to the error file |
p | Compilation performance statistics |
s | Summary of the optimizations performed |
The -suppress switch tells KAP C what diagnostic information to suppress:
Value | Effect |
---|---|
e | Suppresses error messages |
w | Suppresses warning messages |
This section provides information about
kapc
language switches.
4.8.1 -restrict, -norestrict, (-restrict)
The -restrict switch allows KAP to parse the C programming language qualifiers restrict and _restrict . This language feature can help KAP better optimize loops that contain subscripted objects.
The
-norestrict
switch disables parsing of the
restrict
and
_restrict
qualifiers.
4.8.2 -signed, (on)
The
-signed
switch changes
char
symbols to
signed char
. This switch is sometimes necessary when porting code from other
platforms whose C compiler defaults
char
to
signed char
.
4.9 Advanced Optimization Switches for the kapc Preprocessor
These kapc switches control, or provide information for, transformations that are machine-specific or program-specific. They are provided to allow the advanced user to experiment with obtaining the maximum optimization of a specific application code.
Some of these switches set parameters that KAP uses to optimize memory hierarchy usage.
Knowing how much data can be kept in fast memory (cache or arithmetic
registers), and the costs of moving data in the memory hierarchy,
enable better optimization of memory reference patterns. The
-scalaropt=3
and
-roundoff=3
switches are required for memory management to be enabled.
4.9.1 -addressresolution, -arl, (-arl=1)
The -addressresolution switch tells KAP what level of data aliasing might be present in the program. Data aliasing is the use of multiple names for the same memory location. When there might be multiple ways for the same variable to be referenced, KAP is more cautious about transforming the code in ways that might change the order in which variables and arrays are used.
The associated pragma #pragma _KAP arl(n) has the same meaning. The switch is equivalent to a pragma at the beginning of the file, and is thus overridden by other pragmas later in the file.
The meanings of the individual levels are:
int *p; ... for ( i=0; i<n; i++ ) { p[i] = a[i]; } |
#pragma _KAP arl(3) float *a; f(x) float x[100]; { int i; float f[100]; for ( i=0; i<100; i++ ) { a[i] = x[i] + f[i]; } } |
The -arclimit switch sets the size of the dependence arc data structure that KAP uses to perform data dependence analysis. This data structure is dynamically allocated on a loop-nest-by-loop-nest basis. See Appendix A, Data-Dependence Analysis for a description of data-dependence analysis.
The formula that you use to estimate the number of dependence arcs for a given loop nest is as follows:
dependence_array_size = max (#_of_statements * 4, arclimit value) |
This is an estimate because KAP is assuming that each statement, in the worst case, would have four dependence arcs .
If a loop contains too many dependence relationships and cannot be represented in the dependence data structure, KAP will give up optimization of the loop.
When the Loop Information Table is included in the listing file
(
-listoptions=l
), any loop that was too complex for the dependence data structure to
hold the information will be marked as
too many stmts/DD arcs
. Increasing the
-arclimit
value may enable KAP to optimize the loop. If
-arclimit
is already at its maximum value, you can try simplifying the loop or
dividing it into smaller loops.
The maximum -arclimit value allowed is 5000. If you specify a value greater than 5000, KAP will default to 5000 in its allocation of the data-dependence array.
Most users do NOT need to change this value. |
The
-cache_prefetch_line_count
gives the number of additional lines prefetched into the cache during a
cache miss.
4.9.4 -cacheline, -chl, (-chl=64,64)
The -cacheline switch tells KAP the width of the memory channel in bytes between cache and main memory.
When two arguments are specified, the first argument gives the width of
the memory channel between the primary cache and the secondary cache,
and the second argument gives the width of the memory channel between
the secondary cache and main memory. Omitting the second argument, or
specifying it as 64 (the default), tells KAP not to optimize secondary
cache usage.
4.9.5 -cachesize, -chs, (-chs=32,0)
The -cachesize switch tells KAP the size in kilobytes of the cache memory.
When two arguments are specified, the first argument gives the size of the primary cache, and the second argument gives the size of the secondary cache. Omitting the second argument, or specifying it as 0 (the default), tells KAP not to optimize secondary cache usage.
The default values depend on the
-tune
switch and the Alpha architecture of the system. When
-tune=ev6
, the default values for
-chs
are 32,0.
4.9.6 -dpregisters, -dpr, (-dpr=32)
The
-dpregisters
switch specifies the number of double-precision registers each
processor has.
4.9.7 -each_invariant_if_growth, -eiifg, (-eiifg=20)
When a loop contains an if statement whose condition does not change from one iteration to another, the same test must be repeated for every iteration. The code can often be made more efficient by "floating" the if outside the loop and putting the then and else sections into their own loops.
This gets more complicated when there is other code in the loop, because a copy of it must be included in both the then and else loops, as shown in the following example:
for ( i = ...) { section-1 if ( ) { section-2 } else { section-3 } section-4 } |
Becomes:
if ( ) { for ( i = ...) { section-1 section-2 section-4 } } else { for ( i = ...) { section-1 section-3 section-4 } } |
When sections 1 and 4 are large, the extra code generated can slow a program down through cache contention, extra paging, and so on, more than the reduced number of if tests speed it up.
The -each_invariant_if_growth switch provides a maximum number of lines of executable code of sections 1 and 4 below which KAP tries to float an invariant if outside a loop.
The total amount of additional code generated in a program unit through invariant- if floating can be limited with the -max_invariant_if_growth switch.
The allowed values for the
-each_invariant_if_growth
switch are 0 to 5000.
4.9.8 -fpregisters, -fpr, (-fpr=32)
The
-fpregisters
switch specifies the number of single-precision registers, such as
ordinary floating point, that each processor has.
4.9.9 -fuse, (-nofuse)
The -fuse switch tells KAP to perform loop fusion.
Loop fusion is a conventional compiler optimization that transforms two adjacent loops into a single loop. Data dependence tests allow fusion of more loops than standard techniques allow.
Before KAP can perform loop fusion, you must specify the
-scalaropt=2
or
-optimize=5
switch.
4.9.10 -fuselevel, (-fuselevel=0)
The fuselevel option further controls the level of loop fusion. Whenever you set -fuselevel , KAP automatically sets -fuse .
The possible settings for this option are:
KAP may require large amounts of memory in order to process your source code. The -heaplimit option specifies the maximum size in megabytes that the KAP heap can grow. If this limit is reached, KAP will stop processing your source code and try to exit with an "out of memory" error message.
If you choose a -heaplimit setting that is greater than the amount of memory that your system has available, KAP may run out of memory before it reaches the -heaplimit .
KAP relies on the operating system to warn when the process is about to
run out of memory before the problem occurs. Using
-heaplimit
makes a graceful exit more likely.
4.9.12 -hoist_loop_invariants, -hli, (-hli=1)
The -hoist_loop_invariants switch controls code hoisting of loop-invariant expressions from loops. Note that this switch is independent of the switches -each_invariant_if_growth and -max_invariant_if_growth, which control the floating of invariant-IFs out of loops.
The possible settings for this switch are:
In order to reduce compile time, KAP estimates how long it spends analyzing each loop nest construct. If a loop is too deeply nested, KAP ignores the outer loop and recursively visits the inner loops. The -limit switch is a rough dial to control what KAP thinks is too deeply nested.
Larger loop nest limits might allow more optimization for deeply nested loop structures, but might take more compilation time. The limit does not correspond to the for loop nest level; rather, it is an estimate of the number of loop orderings that can be generated from the loop nest. The -limit switch resets this internal limit.
Most users do NOT need to change this value. |
The -machine switch lets you set characteristics for the system on which KAP output runs.
Use any combination of the following switch settings, except do not specify switches s and n simultaneously:
s | Tells KAP to prefer optimization of a for loop that generates stride-1 (contiguous) references over one that generates non-stride-1 operands. Some computers perform better if consecutive references are contiguous in memory. |
n |
Tells KAP to prefer optimization of a
for
loop that generates non-stride-1 array access over stride-1 array
access.
This is suitable for machine architectures that have special interleaved memory hardware where non-stride-1 array access provides the best performance. |
o |
Tells KAP not to parallelize innermost loops when optimizing but to
parallelize only outermost loops.
This capability is available to prevent parallelization of applications with small inner loop bounds, thereby reducing overhead costs. When the loop bounds are unknown at compile time, KAP might generate parallel concurrent code for innermost loops, a practice that might be inefficient for the actual loop bounds. |
To disable all the switches, enter
-nomachine
on the command line.
4.9.15 -max_invariant_if_growth, -miifg, (-miifg=500)
When a loop contains an if statement whose condition does not change from one iteration to another (loop-invariant), the same test must be repeated for every iteration. The code can often be made more efficient by floating the if outside the loop and putting the then and else sections into their own loops.
This gets more complicated when there is other code in the loop, because a copy of it must be included in both the then and else loops. The -max_invariant_if_growth switch allows you to limit the total number of additional lines of code generated in each program unit through "invariant-if" restructuring.
The allowed values for the
-max_invariant_if_growth
switch are 0 to 50000.
4.9.16 -routine=<rtn_name><switches>, -rt=<rtn_name><switches>, (off)
The -routine switch allows you to specify other switches that apply to specific routines within the source file that KAP processes. The only switches that -routine can specify are as follows:
-each_invariant_if_growth
-max_invariant_if_growth
-optimize
-roundoff
-scalaropt
-skip
-unroll
-unroll2
-unroll3
For example, the command to exclude KAP optimizations for routine sub1 of myprog.c is:
> kcc myprog.c -ckapargs='-routine=sub1 -skip' |
The syntax of a KAP command with the -routine switch requires that -routine and the switches it specifies come at the end of the command line after the C source file, for example:
kapc [-<switches>] source_file.c -routine=<rtn_name>[,<rtn_name>...] -<switches_for_rtn_names> ... |
If the -routine switch and the switches it specifies are not at the end of the command line after the source file, KAP generates the following error message:
|
You can specify switches that apply to all routines in the source file after kcc or kapc . Of course, <rtn_name> must be a routine in source_file.c . Finally, switches for each instance of <rtn_name> must come from the preceding bulleted list. In particular, the -skip does not process the associated routine.
For example, consider the following command line:
kapc -scalaropt program.c -routine=sub_1 -roundoff=2 -optimize=3 |
This command invokes KAP and passes the -scalaropt switch to all program units in file program.c including sub_1 . Furthermore, program unit sub_1 is processed with both the -roundoff and -optimize switches.
Using the -routine switch implies that directives equivalent to the specified switches are asserted only while processing particular routines. The effect is the same as if the implied directives were inserted at the top of the associated routines.
Using the -routine switch also makes the resulting KAP command contain two halves. The first half looks like any other KAP command because it contains KAP switches different from -routine and a source file name. The second half is different because it contains one or more -routine switches, each with associated routines and switches for the routines selected from the preceding bulleted list.
For example, consider the following command line:
kapc -cachesize=8,0 -syntax=a my_program.c \ -routine=sub_1,sub_2,sub_3 -roundoff=2 -optimize=3 -routine=sub_4 -unroll |
Next is an explanation of the two halves:
Finally, the usual rules for shortening the names of switches also apply to the -routine switch. For example, the following KAP command fragments produce identical results:
-routine=subroutine_a -optimize=3 -unroll=4
4.9.17 -setassociativity, -sasc, (-sasc=1,1)
The -setassociativity switch provides information on the mapping of physical addresses in main memory to cache pages in the Level 1 and Level 2 cache.
The first integer describes the set associativity of the Level 1 cache. The second integer describes the set associativity of the Level 2 cache.
A setting of n means that a page can appear in any of
n places in the cache. For instance, a setting of 1 means that
a page in main memory can be placed in only one place on the cache. If
the cache page is already in use, its contents will have to be
rewritten or flushed in order to copy the newly accessed page into the
cache.
4.9.18 -stdio, (-nostdio)
The
-stdio
switch enables strength reduction of certain functions in the
stdio.h
header file and requires
-scalaropt=3
. Programs that call functions such as
printf
or
fput
extensively will see improved I/O performance with this switch.
4.9.19 -syntax, -sy, (-nosyntax)
The -syntax switch lets you select the dialect of C that KAP will accept. The settings are:
a --- Checks for strict compliance with ANSI standard C. Extensions are flagged with warning messages.
d --- Specifies Compaq C.
k --- Accepts Kernighan & Ritchie C.
Note: -nosyntax implies the default C language dialect of Compaq C (that is, -sy=d ).
The -standard compiler switch settings affect the -syntax switch settings as follows:
The value specified in the -tablesize switch is compared to the mathematical product of the number of statements and the number of variables referenced in a given program unit. When the product is greater than the tablesize value, a "program-too-large" message is issued stating the required tablesize.
Note that you should review your process resource limits with the
limit
command before adjusting the
-tablesize
switch. Use the C shell command
unlimit
or, for example, a command such as
limit stacksize 32768
to increase all, or specific, resource limits.
4.9.21 -unroll, -ur, (-ur=4), -unroll2, -ur2, (-ur2=160), -unroll3, -ur3, (-ur3=1)
The -unroll, -unroll2, and -unroll3 switches control how KAP unrolls inner loops.
Loop execution is often more efficient when the loops are unrolled. KAP unrolls the loop until either the loop has been unrolled the number of times given in the -unroll switch, or the amount of "work" in each iteration reaches the value given by the -unroll2 switch.
The switch -ur=0 means to use default values to unroll.
The switch -ur=1 means no unrolling.
The unroll2=n switch sets the upper limit for unrolling. If the estimate of work is greater than n, then the loop will not be unrolled.
The default, n=160 , means a maximum work of 160 in an unrolled iteration. It means that a work of 150 also results in an unrolled iteration while a work of 170 results in no unrolling.
Work is estimated by counting operands and operators in a loop. The amount of work in each loop iteration is shown in the loop table in the annotated listing.
The unroll3=n switch sets the lower limit for unrolling. If the estimate of work is less than n, then the loop will not be unrolled.
The default, n=1 , means a minimum work of 1 in an unrolled iteration. If you choose a higher value, such as 20, it would mean that a work of 30 also results in an unrolled iteration while a work of 10 results in no unrolling.
The -scalaropt=2 switch is required to enable loop unrolling.
If you use kapc with the Compaq C compiler optimization switch set to -O5 , you should turn off loop unrolling by setting -unroll=1 . |
Outer loop unrolling is a part of memory management and is not controlled by these switches.
There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. No warning is given if you request more than 100 unrolled iterations.
By increasing or decreasing the maximum iteration workload, you can control the amount of work that ends up in each loop iteration, as long as the number of unrolled iterations does not exceed the unroll limit. The workload is estimated by adding operations, including subscripts and assignments; scalars, not including the loop index; and if statements. Loops with function calls are weighted more heavily and are never unrolled. The following example demonstrates the workload limit. Assume that -unroll=3 and -unroll2=24 are the switch settings.
for ( i=0; i<n; i++ ) { a[i] = b[i]+c[i]; } |
The amount of work in this loop is 5. By default, the loop would be unrolled three times, because that is the maximum allowed by the unroll limit, and the resulting weight (3X5) is less than the unroll2 limit of 24.
If you set the -unroll2 limit to 10, the loop would be unrolled twice because unrolling the original loop three times would produce a loop with workload of 15, which would exceed the -unroll2 limit. The result would be the following:
for ( i = 0; i<=n - 2; i+=2 ) { a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]; } for ( ; i<n; i++ ) { a[i] = b[i] + c[i]; } |
The unroll3=n switch sets the lower limit for unrolling. If there are less than n units of work in the loop (same units as -unroll2 ), the loop will not be unrolled. The amount of work in each loop iteration is shown in the loop table in the annotated listing. This switch value should be left at 1, the default. A value less than the default could result in a program that executes more slowly.
Previous | Next | Contents | Index |