Previous | Contents | Index |
This chapter describes the command-line switches for Compaq KAP
Fortran/OpenMP for Tru64 UNIX.
4.1 Overview
To specify a command-line switch, you can use the long name or short name. Additionally, you can specify command-line switches on the first line of the source file. (See the description of the !*$* options line in Section 4.11.)
The short names for switches are provided for the convenience of interactive users. Because the short names may not be unique in future versions of KAP, the long names should be used when compatibility between versions is required. |
Many KAP command-line switches can also be set with inline directives or assertions. See Chapter 5 for information about directives and Chapter 6 for information about assertions. Chapter 8 shows extensive examples of transformations controlled by these switches.
Tables 4_1 and 4-2 list the command-line switches for the kf90 driver and the kapf90 translator. The first column lists the long name of each switch, and also shows the functional categories of switches as follows:
The next two columns list the short names and default value of each switch.
See Chapter 9, the -listoptions=k switch, for how to get the switch values active for a given KAP run.
A hyphen (-) is required before each switch name, but the hyphen is not shown in the following tables. |
Long Name | Related Switch | Short Name | Default Value |
---|---|---|---|
fext=<Fortran file extension> | fext=f | ||
f90=<path to Fortran compiler> | f90=/usr/bin/f90 | ||
f90kap=<path to kapf90> | f90kap=/usr/bin/kapf90 | ||
fkapargs=<kap option string> | |||
S | off | ||
tmpdir=<temporary directory path> | tmpdir=/tmp/ | ||
tune=<architecture> | tune= <current system architecture> | ||
verbose | v | nov |
Long Name | Related Switch | Short Name | Default Value |
---|---|---|---|
General Optimization | |||
[no]interchange | o,so | interchange | |
[no]namepartitioning=
(<integer>,<integer>) |
so |
[no]namepart=
(<integer>,<integer>) |
nonamepartitioning |
optimize=<integer> | o=<integer> | optimize=5 | |
[no]recursion | [no]rec | norecursion | |
roundoff=<integer> | o, so | r=<integer> | roundoff=3 |
scalaropt=<integer> | r | so=<integer> | scalaropt=3 |
[no]skip | [n]sk | noskip | |
tune=<architecture> | tune=<current system architecture> | ||
Parallel Processing | |||
chunk | scheduling | chunk=1 | |
[no]concurrent | [no]conc | noconcurrent | |
minconcurrent=<integer> | mc | minconcurrent=1000 | |
[no]parallelio | [no]pio | noparallelio | |
pdefault | safe | ||
psyntax | conc | openmp | |
scheduling=<list> | sched=<list> | scheduling=e | |
Fortran 90 Dialect | |||
align_common=<integer> | align_common=8 | ||
align_struct=<integer> | align_struct=4 | ||
assume=<list> | a=<list> | assume=cel | |
noassume | na | ||
[no]datasave | sv | [n]ds | datasave |
[no]dlines | [n]dl | nodlines | |
[no]escape | [n]escape | noescape | |
[no]freeformat | [n]ff | nofreeformat | |
integer=<integer> | int=<integer> | integer=4 | |
intlog | sy | intlog | |
kind=<integer> |
log,
real, int |
kind=4 | |
logical=<integer> | log=<integer> | logical=4 | |
[no]onetrip | [n]1 | noonetrip | |
real=<integer> | rl=<integer> | real=4 | |
save=<list> | ds, rc | sv=<list> | save=manual_adjust |
scan=<integer> | scan=72 | ||
syntax=<value> | sy=<value> | off | |
[no]type | [n]ty | notype | |
Directives | |||
[no]directives=<list> | [n]dr=<list> | directives=akpv | |
[no]ignoreoptions | [n]ig | noignoreoptions | |
Inlining and IPA | |||
inline[=<names>] | aggressive | inl[=<names>] | off |
noinline[=<names>] | ninl[=<names>] | off | |
ipa[=<names>] | ipa[=<names>] | off | |
noipa[=<names>] | nipa[=<names>] | off | |
inline_and_copy=<names> | inlc=<names> | off | |
inline_create=<file> | ipa | incr=<file> | off |
ipa_create=<file> | ipacr=<file> | off | |
inline_depth=<integer> | ind=<integer> | inline_depth=2 | |
ipa_depth=<integer> | ind=<integer> | ipa_depth=2 | |
inline_from_files=<file>,<file> | inl | inff=<file>,<file> | current source file |
ipa_from_files=<file>,<file> | ipa | ipaff=<file>,<file> | current source file |
inline_from_libraries=<library>,
<library> |
inl |
infl=<library>,
<library> |
off |
ipa_from_libraries=<library>,<library> | ipa |
ipafl=<library>,
<library> |
off |
inline_looplevel=<integer> | inll=<integer> | inll=2 | |
ipa_looplevel=<integer> | ipall=<integer> | ipall=2 | |
inline_manual | inm | off | |
ipa_manual | ipam | off | |
inline_optimize=<integer> | inline_optimize=0 | ||
ipa_optimize=<integer> | ipa_optimize=0 | ||
Input-Output | |||
cmp=<file> | cmp=<file> | see Section 4.9.1 | |
include=<pathname> | inc=<pathname> | off | |
list[=<file>] | [n]l | see Section 4.9.3 | |
Listing | |||
cmpoptions[=<list>] | cp[=<list>] | nocmpoptions | |
lines=<integer> | ln=<integer> | lines=55 | |
listingwidth=<integer> | lo | lw=<integer> | listingwidth=132 |
listoptions=<list> | lo=<list> | listoptions=klo | |
suppress=<list> | su=<list> | off | |
Advanced Optimization | |||
[no]aggressive=<list> | nat,inl | [n]ag=<list> | noaggressive |
arclimit=<integer> | so, r | arclm=<integer> | arclimit=5000 |
cache_prefetch_line_count=<integer> | cplc=<integer> | cache_prefetch_line_count=0 | |
cacheline=<integer>[,<integer>] |
chl=<integer>
[,<integer>] |
cacheline=64,64 | |
cachesize=<integer>[,<integer>] |
chs=<integer>
[,<integer>] |
cachesize=32,0 | |
dpregisters=<integer> | dpr=<integer> | dpregisters=32 | |
each_invariant_if_growth=<integer> | so, r, miifg | eiifg=<integer> | eiifg=20 |
fpregisters=<integer> | fpr=<integer> | fpr=32 | |
[no]fuse | so,o | [n]fuse | nofuse |
fuselevel=<integer> | fuse | fuselevel=0 | |
generateh | hdir,useh | on | |
hdir=<directory name> | hd=<directory name> | hdir=<current directory> | |
heaplimit=<integer> | heap=<integer> | heaplimit=100 | |
hoist_loop_invariants=<integer> | hli=<integer> | hli=1 | |
[no]interleave | intl | interleave | |
library_calls=<name> | ro | lc=<name> | off |
limit=<integer> | lm=<integer> | lm=10 | |
[no]machine=<list> | so, r | ma=<list> | ma=s |
max_invariant_if_growth=<integer> | so, r, eiifg | miifg=<integer> | miifg=500 |
routine=
<routine_name>/<limited switches> |
rt=<routine_name>
/<limited switches> |
off | |
setassociativity=
<integer>[,<integer>] |
so, r |
sasc=
<integer>[,<integer>] |
sasc=1,1 |
[no]srlcd | so | [n]srlcd | nosrlcd |
tablesize=<integer> | ts=<integer> | ts=24000000 | |
unroll=<integer> | so, r | ur=<integer> | unroll=4 |
unroll2=<integer> | so, r | ur2=<integer> | unroll2=160 |
unroll3=<integer> | so, r | ur3=<integer> | unroll3=1 |
useh | hdir,genh | on |
The subheadings in the following sections list the long and short names for each switch, followed by its default in parentheses.
Unless otherwise noted, the code examples in this chapter were run with
-optimize=5, -scalaropt=2, -roundoff=2
, and
-unroll=1
.
4.2 Switches for the kf90 Driver
The following sections explain the function of each kf90 driver switch.
4.2.1 -fext, (-fext=f)
This switch treats files with the indicated extension as Fortran source files.
4.2.2 -f90, (-f90=/usr/bin/f90)
This switch provides an alternate path to the Fortran compiler.
4.2.3 -f90kap, (-f90kap=/usr/bin/kapf90)
This switch provides an alternate path to the kapf90 translator.
4.2.4 -fkapargs
This switch passes switches to the kapf90 translator.
4.2.5 -S, (off)
This switch tells the kf90 driver to save all intermediate files and to
pass on the switch to the Fortran compiler to save the assembly
language output.
4.2.6 -tmpdir, (-tmpdir=/tmp/)
This switch provides an alternate directory for temporary files. This
switch can also be set by the environment variable
TMPDIR
.
4.2.7 -tune, (-tune=<current system architecture>)
KAP determines whether the host Alpha architecture is ev4 , ev5 , or ev6 and then optimizes your program for that architecture by default. In the event you compile a program on one architecture but plan to run it on another, you should override the default by setting -tune equal to the architecture of the target system.
The -tune switch and the Fortran compiler -tune host switch work independently and perform different optimizations. If the switch appears on the kf90 command line inside the -fkapargs switch, for example:
> kf90 myprog.f -fkapargs='-tune=ev6'
the switch value will be applied only to the KAP translator. However, in this case:
> kf90 myprog.f -tune=ev6
the switch will be applied to both KAP and the Fortran compiler.
4.2.8 -verbose, -v, (-nov)
This switch prints the passes as they execute with their arguments and
their input and output files. Also prints final resource usage in the
C-shell time format.
4.3 General Optimization Switches for kapf90
The following sections explain the function of each general
optimization switch.
4.3.1 -interchange, -nointerchange, (-interchange)
Use the
-interchange
switch to enable or disable loop interchanging. KAP enables loop
interchange when
-interchange
is specified and the
-optimize
switch level is at least 1 or the
-scalaropt
switch level is 3. If you specify
-nointerchange
, KAP disables loop interchange regardless of the
-optimize
or
-scalaropt
levels.
4.3.2 -namepartitioning, -namepart, (-nonamepart)
The -namepartitioning switch tells KAP to look at distinct array names and limit the number of arrays that appear in a loop to avoid cache thrashing. That is, this switch breaks a loop containing, for example, references to arrays A and B into two loops. One loop references array A and the other loop references array B.
Two arguments ( i and j ) used in the switch -namepartitioning=i , j control name partitioning as follows:
If no arguments appear with the -namepartitioning switch, KAP uses its default values of 2 for the minimum and 8 for the maximum number of partitions.
Before KAP can perform name partitioning, you must specify the switch -scalaropt=n , where n is greater than or equal to 3.
The
-nonamepart
switch explicitly prevents name partitioning.
4.3.3 -optimize, -o, (-optimize=5)
The -optimize switch sets the optimization level, ranging from 0 to 5.
The meaning of each of the different optimization levels is shown below. Each optimization level is cumulative, for example, level 4 performs what is listed below for that level, in addition to what is listed for levels 0--3.
A higher optimization level results in more optimization, more analysis, and more ambitious transformations, along with increased compilation time.
The optimization level can also be modified on a loop-by-loop basis by
the
!*$* optimize
directive. Some of the code analysis techniques can be enabled with the
-scalaropt
switch.
4.3.4 -recursion, -rec, -norec, (-norecursion)
The -recursion switch informs KAP that subroutines and functions in the source program might be called recursively (that is, it calls itself or calls another routine that calls it). This affects storage allocation decisions and the interpretation of the -save option. The -recursion switch must be in force in each recursive routine that KAP processes, or unsafe transformations could result.
The
-norecursion
option tells KAP to assume that recursion is not used in the program
being processed.
4.3.5 -roundoff, -r, (-r=3)
The -roundoff switch allows you to specify the change from serial roundoff error that is tolerable.
If an arithmetic reduction is accumulated in a different order than in the scalar program, the roundoff error is accumulated differently, and the final result may differ from that of the original program's output. While the difference is usually insignificant, certain restructuring transformations performed by KAP must be disabled to obtain exactly the same results as the scalar program. These transformations are discussed further in Chapter 8, Transformations.
KAP classifies its transformations by the amount of difference in roundoff error that can accumulate so you can decide what level of roundoff error differences is allowable.
The meaning of each roundoff level is shown below. Each level is cumulative, performing what is listed for each level as well as what is listed for the previous levels. The levels are:
The -scalaropt switch sets the level at which dusty-deck and other scalar transformations are performed.
Unlike the -scalaropt switch, the !*$* scalar optimize directive sets the level of loop-based optimizations (for example, loop fusion) only, but not straight-code optimizations or dead-code elimination.
The allowed values and their meanings are as follows:
Use the
-skip
switch to tell KAP to ignore application of optimizing transformations
for all routines within the input file. If you want to be selective in
terms of which routines are not optimized, see the description of the
-routine
switch in Section 4.7.20, -routine, -rt, (off).
4.3.8 -tune, (-tune=<current system architecture>)
The KAP preprocessor determines whether the host architecture is ev4 , ev5 , or ev6 and then optimizes your program for that architecture by default. In the event you compile a program on one architecture but plan to run it on another, override the default by setting -tune equal to the architecture where the program will run. For example, if you compile a program on ev4 architecture, but plan to run it on ev5 , use -tune=ev5 .
The -tune switch and the Fortran compiler -tune host switch work independently and perform different optimizations. Assume that the switch appears on the kf90 command line inside the -fkapargs switch. For example:
kf90 myprog.f -fkapargs='-tune=ev6' |
In this case, the switch value will be applied only to the kapf90 translator.
However, assume the following case:
kf90 myprog.f -tune=ev6 |
In this case, the switch will be applied to both kapf90 and the Fortran
compiler.
4.4 Parallel Processing Switches for kapf90
The following sections describe the switches you use to control how the
multiprocessor version of KAP prepares programs for parallel execution.
4.4.1 -chunk, (-chunk=1)
The
-chunk
switch modifies, and is used only with, the
-scheduling
switch. The
-chunk
switch determines the number of loop iterations that are in a group.
4.4.2 -concurrent, -conc, -noconc, (-noconcurrent)
The -concurrent switch directs KAP to restructure the source code for parallel processing.
Setting -noconcurrent disables parallel execution and allows only serial optimizations to take place. You can enable and disable parallel execution on a module-by-module basis using KAP directives or on a loop-by-loop basis using KAP assertions. See Section 5.4, Parallel Processing Directives for Automatic Parallelization and Section 6.3, Parallel Processing Assertions that Guide Automatic Parallelization .
Programs containing many loops that require synchronization or programs that have loops with small iteration counts may run more slowly when parallelized. In these cases, you should disable parallel execution.
Section 3.2, Parallel Processing Methods summarizes the two methods of parallelization, automatic
and combined, that require the
-conc
switch. Several examples of the
-conc
switch are in the descriptions of these two methods.
4.4.3 -minconcurrent, -mc, (-mc=1000)
The -minconcurrent switch sets the level of work in a loop above which KAP executes the loop in parallel. the range of values for this switch is all numbers greater than or equal to 0. The higher the -minconcurrent value, the more interations and/or statements the loop body must have to run concurrently.
Executing a loop in parallel incurs overhead that varies with different systems. If a loop has little work, the overhead required to set up parallel execution may make the loop execute more slowly than it would using serial execution.
At compilation time, KAP estimates the amount of work inside a loop on the basis of loop computations and loop iterations. KAP multiplies the loop iteration count by the sum of the noindex operands/results and the nonassignment operators. KAP compares its estimation with the -minconcurrent value. If the estimated amount of work is greater than the -minconcurrent value, KAP generates parallel code for the loop. Otherwise, the loop execution is serial. This is called a two-version loop.
If the DO loop bounds are known at compilation time, KAP computes the exact iteration count. If the DO loop bounds are unknown, KAP generates a block IF around the parallel code. The block IF allows a run-time decision whether or not to execute the loop in parallel.
To disable the generation of two-version loops throughout the program, use the switch -minconcurrent=0 . To disable this action in specific DO loops, use the !*$* minconcurrent(0) directive.
Setting the -minconcurrent switch automatically activates the -concurrent switch.
The following loop illustrates this switch using the minconcurrent default of 1000:
DO 10 I = 1,N A(I) = B(I) + C(I) 10 CONTINUE |
Becomes:
INTEGER II2, II1 II2 = N / 4 II1 = II2 * 4 + 1 !$OMP PARALLEL IF (N .GT. 200) SHARED (N,A,B,C) PRIVATE (I) !$OMP DO DO I=1,N-3,4 A(I) = B(I) + C(I) A(I+1) = B(I+1) + C(I+1) A(I+2) = B(I+2) + C(I+2) A(I+3) = B(I+3) + C(I+3) END DO !$OMP END DO NOWAIT !$OMP END PARALLEL DO I=II1,N,1 A(I) = B(I) + C(I) END DO END |
At run time, if the iteration count N is greater than or equal to 425 (1000/4), the concurrent loop executes in parallel; otherwise, it executes serially.
When KAP inserts OpenMP directives in front of DO loops whose bounds
are not known in a source program such as the one above, it uses the
OpenMP PARALLEL IF construct.
4.4.4 -parallelio, -nopio, -pio, (-noparallelio)
The
-parallelio
switch allows parallel execution of loops with I/O. Use this switch
when you know the I/O will not execute. An example is a test for an
error condition that causes a message to be printed.
4.4.5 -pdefault, (-pdefault=safe)
This switch tells KAP how to process variables that are not listed in an OpenMP data environment directive. Furthermore, it is used only during directed parallelization.
Possible settings for this switch are:
This switch specifies the set of parallel directives that KAP should process. Its value is openmp .
The setting -psyntax=openmp is required if your applications use OpenMP directives.
Use of -psyntax=openmp assumes one of the following:
The -scheduling switch tells KAP the kind of scheduling to use for loop iterations on a multiprocessor system.
The -scheduling options are:
The following sections explain the function of each Fortran dialect
switch.
4.5.1 -align_common, (-align_common=8)
The
-align_common
switch aligns data elements in COMMON blocks. Its integer value
represents the boundary size in bytes.
4.5.2 -align_struct, (-align_struct=4)
The
-align_struct
switch aligns subfields. Its integer value represents the boundary size
in bytes.
4.5.3 -assume, -a, (-assume=cel), -noassume, -na
The -assume switch tells KAP to make certain global assumptions about the program being processed. Most of these can also be controlled by various assertions (see Chapter 6, KAP Assertions).
The -assume switch settings and the corresponding KAP assertions are as follows:
By default, KAP assumes that a program conforms to the Fortran 77 standard, that is, -assume=el . The default includes -assume=c to simplify some analysis and inlining.
To disable all the above assumptions, enter
-noassume
on the command line.
4.5.4 -datasave, -ds, -nodatasave, -nds, (-datasave)
The -datasave switch tells KAP to treat local variables in a subroutine or function that appear in DATA statements as if they were also in SAVE statements. That is, their values will be retained between invocations of the subroutine or function. This is the practice of many commercial Fortran compilers. This choice affects certain optimizations performed by KAP.
The -nodatasave switch complies with the Fortran 77 standard.
See also the
-save
command-line switch.
4.5.5 -dlines, -dl, -ndl, (-nodlines)
The -dlines switch allows a D in column 1 to be treated as a character space. The rest of that line is then parsed as a normal Fortran statement. By default, KAP treats these lines as comments. This switch is useful for the inclusion or exclusion of debugging lines. Data dependence relationships might be different when the D lines are included.
In the following example, the -nodlines default would cause the WRITE statement to be treated as a comment:
DO 10 I = 1,N A (I) = B (I) D WRITE (*,*) A (I) 10 CONTINUE |
But when -dlines is specified, KAP sees a WRITE statement and will not optimize the whole loop as it is:
DO 2 I=1,N A(I) = B(I) 2 CONTINUE DO 3 I=1,N WRITE (*, *) A(I) 3 CONTINUE |
The
-escape
switch is used to process C-style escape-character sequences. An escape
sequence provides a general and extensible method for representing
hard-to-get or invisible characters.
4.5.7 -freeformat, -ff, -nff, (-nofreeformat)
The -freeformat switch removes the standard column restrictions for Fortran source code. For example, source files can be up to 132 columns and use an ampersand (&) at the end of the line to indicate continuation. See the Compaq Fortran Language Reference Manual for more information.
The
-freeformat
switch is off by default, and the usual Fortran 90 conventions apply.
For example, files are truncated after column 72 unless you specify the
Fortran flag
-extend_source
. A character (except a zero or a blank) in column 6 indicates a
continuation line.
4.5.8 -integer, -int, (-int=4)
The -integer switch specifies a size in bytes, N, for the default size of INTEGER variables. When N=2 or 4, take INTEGER*N as the default INTEGER type. When N=0, use the ordinary default length for INTEGER variables.
Executing
kf90
and explicitly calling the compiler switch
-noi4
will cause KAP to be called with the command-line switches
-integer=2
and
-logical=2
.
4.5.9 -intlog, (-intlog)
The
-intlog
switch enables the mixing of integer and logical operands in
expressions. When integer operands are used with logical operators, the
operations are performed in a bitwise manner. When logical operations
are used with arithmetic operators, the operands are treated as
integers.
4.5.10 -kind, (-kind=4)
The -kind switch establishes the value for the Fortran 90/95 KIND type parameter used when KIND has not been specified or KIND=0 is specified. -kind applies to all data types: logical, integer, real, and complex.
The values for -kind are 4 or 8.
The
-kind
switch allows you to change the underlying precision of computations
without violating the Fortran 95 standard constraints that default
logical, default integer and default real occupy the same amount of
storage and that default double precision and default complex occupy
twice the storage of default real.
4.5.11 -logical, -log, (-log=4)
The -logical switch specifies a size in bytes, N, for the default size of LOGICAL variables. When N=1, 2, or 4, take LOGICAL*N as the default LOGICAL type. When N=0, use the ordinary default length for LOGICAL variables.
Executing
kf90
and explicitly calling the compiler switch
-noi4
will cause KAP to be called with the command-line switches
-integer=2
and
-logical=2
.
4.5.12 -onetrip, -1, (-n1), -noonetrip
The -onetrip switch allows you to specify "one-trip" DO loops. Many pre-Fortran 77 compilers implemented DO loops that would always have at least one iteration, even if the loop index initial value was higher than the final value. This switch informs KAP that the program being processed contains loops that need the one-trip feature.
Executing
kf90
and explicitly calling the compiler switch
-nof77
will cause KAP to be called with the
-onetrip
command-line switch.
4.5.13 -real, -rl, (-rl=4)
The -real switch tells KAP what the Compaq Fortran compiler default size for REAL variables is in bytes, N, where REAL*N can be 4 or 8.
For example, to change the default size of REAL variables from 4 to 8, do the following:
The -save switch tells KAP whether to perform live variable analysis to determine if the value of a local scalar variable in a subroutine or function needs to be saved between invocations of the routine being processed. SAVE statements will be generated for any variables requiring them. KAP will not delete or ignore a SAVE statement coded by the user.
Saving local variables might be required for correct execution of the program, but can restrict KAP optimizations.
With -save=manual , KAP assumes you have inserted the necessary SAVE statements into the code and performs no corresponding analysis of its own. The user-written SAVE statements are assumed to be correct and sufficient. This combination is not affected by the -recursion switch.
The effect of -save=manual_adjust depends on the -recursion setting:
The effect of -save=all_adjust depends on the -recursion setting:
With -recursion , this is the same as -save=all_adjust :
The -scan switch allows you to set the length of the Fortran input lines.
The value must be either 72, 120, or 132.
KAP will ignore (that is, treat as a comment) characters on columns
beyond the value of the
-scan
switch.
4.5.16 -syntax, -sy, (off)
The -syntax switch directs KAP to check for compliance with certain syntactic rules. Using a dialect switch can prevent a construct being translated differently than expected by a user who is familiar with a different implementation of Fortran.
The default is to accept the superset of the ANSI Fortran 95 standard defined by Compaq Fortran, which includes many common Fortran 77 extensions.
The possible options for this switch are:
The
-type
switch causes KAP to issue warning messages for variables not
explicitly typed. This is as if there were an IMPLICIT NONE at the top
of each program unit. The
-notype
default suppresses this checking.
4.6 Inlining and Interprocedural Analysis Switches for kapf90
The following sections explain the function of each switch used in subprogram inlining and Interprocedural Analysis (IPA).
Inlining is the process of replacing a subroutine CALL or function reference with the text of the subroutine or function. IPA is the process of inspecting a called routine to identify relationships between the arguments, the returned value, and the code surrounding the call to identify opportunities for optimization.
Inlining and IPA can be performed in the same KAP run. The only restriction is that the same routine cannot be in global lists for both inlining and IPA. You can use the !*$* inline and !*$* ipa directives to inline a subroutine or function in one place and interprocedurally analyze it in another. See Chapter 5 and Chapter 7 for information about these directives.
For additional information about these switches and examples of their
use, see Chapter 7.
4.6.1 -inline, -inl, -noinline, -ninl, (off) -ipa, -ipa, -noipa, -nipa, (off)
The -inline switch provides KAP with a list of routines to inline. The -noinline switch tells KAP to inline all routines except the listed ones.
The -ipa switch provides KAP with a list of routines to analyze. Additionally, -ipa causes KAP to give information in the annotated listing about appropriate settings for the -ind , -inll , and -ipall switches on a loop-by-loop basis.
If you specify either the -inline or the -ipa switch without an argument list, KAP will try to inline/analyze all the called subroutines and functions in the inlining (or IPA) universe specified by the -inline_from... ( -ipa_from... ) switches, subject to restrictions imposed by the -inline_depth and -inline_looplevel ( -ipa_looplevel ) switches.
To permit KAP to inline routines that contain static SAVE or DATA variables use the -aggressive=c switch with -inline . The -aggressive=c switch promotes the static variables to members of a COMMON that is introduced into the program. See Section 4.7.1 for more information.
If you include a list of names, for example: -inline=mkcoef,yval , then just the routines named will be inlined or analyzed.
A list of routines must be included with -noinline or -noipa . All routines in the inlining/IPA universe are candidates for inlining except the listed ones.
The
-[no]inline
and
-[no]ipa
command-line switches can be overridden by the
!*$* [no]inline
and
!*$* [no]ipa
directives. See Chapter 5 and Chapter 7 for more information
about these directives.
4.6.2 -inline_and_copy, -inlc, (off)
The -inline_and_copy=<names> switch functions like the -inline switch, except that if all CALLs or references to a subprogram are inlined, the text of the routine is not optimized but is copied unchanged to the transformed code file. This is intended for use when you are inlining routines from the same file as the call. The -inline_and_copy switch has no special effect when the routines being inlined are being taken from a library or another source file.
When a subprogram has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized routine will still be available in case one of the other source files contains a reference to it, so no errors will result.
The -inline_and_copy algorithm assumes that all CALLs and references to the routine precede it in the source file. If the routine is referenced after the text of the routine, and that particular call site cannot be inlined, the unoptimized version of the routine will be invoked. |
The -inline_create=<filename> and -ipa_create=<filename> switches cause KAP to build a library file containing partially analyzed routines for later inlining. The library created is used with the -inline_from_libraries ( -ipa_from_libraries ) switch.
Libraries created with -inline_create can be used with either inlining or IPA, because they contain essentially complete descriptions of the subroutines and functions included.
Libraries created with -ipa_create can be used only with IPA, because they do not have the complete text of the routines, just the data relationship information.
You can use any name for the created library. However, for maximum
compatibility with the
-inline_from_libraries
and
-ipa_from_libraries
switches, Compaq recommends that you use the
.klib
extension.
4.6.4 -inline_depth, -ind, (-ind=2), -ipa_depth, -ipad, (-ipad=2)
The -inline_depth and -ipa_depth switches set the maximum level of subprogram nesting that KAP will attempt to inline. Higher values instruct KAP to trace CALLs and function references further.
The values and their meanings are:
Chapter 7, Inlining and IPA has examples of recursive inlining with different values of -inline_depth .
The
!*$* [no]inline
and
!*$* [no]ipa
directives, when enabled, are not affected by the
-inline_depth
or
-ipa_depth
restrictions.
4.6.5 -inline_from_files, -inff, (current source file)
See Section 4.6.8.
4.6.6 -inline_from_libraries, -infl, (off)
See Section 4.6.8.
4.6.7 -ipa_from_files, -ipaff, (current source file)
See Section 4.6.8.
4.6.8 -ipa_from_libraries, -ipafl, (off)
The -..._from_... switches provide KAP with the locations of subroutines and functions available for inlining/IPA. The total set of available routines is called the inlining (or IPA) universe.
The -..._from_files switches take the names of source files and directories containing source files. Including a directory, for example, -ipaff=/work is equivalent to the notation /work*.f90 .
The -..._from_libraries switches take the names of libraries created with the -..._create switches and directories containing such libraries. In directories, the KAP libraries are identified by the .klib extension.
Multiple files/libraries or directories can be given in one -..._from_... switch, separated by commas or colons. Multiple -..._from_... switches can be specified on the command line. KAP searches for subroutines and functions in the provided files and libraries in the order in which they appear on the command line.
The
-..._from_...
switches do not activate inlining or IPA. The
-inline
or
-ipa
switches must be specified.
4.6.9 -inline_looplevel, -inll, (-inll=2), -ipa_looplevel, -ipall, (-ipall=2)
The -..._looplevel switches enable you to limit inlining to just functions that are referenced in nested loops where the effects of reduced function call overhead or enhanced optimizations will be multiplied.
The parameter is defined from the most deeply nested subprogram reference. For example, -inll=1 restricts inlining to functions referenced in the deepest loop nest; -inll=3 restricts inlining to those routines referenced at the three deepest levels. The DO loop nest level of each CALL or function reference is included in the optional calling tree section of the listing files.
The -..._looplevel switches do not activate inlining or IPA. The -inline or -ipa switches must be specified.
The
!*$* [no]inline
and
!*$* [no]ipa
directives, when enabled, are not affected by the
-..._looplevel
restrictions.
4.6.10 -inline_manual, -inm, (off), -ipa_manual, -ipam, (off)
These switches cause KAP to recognize the !*$* [no]inline and !*$* [no]ipa directives. This allows manual control over which functions are inlined/analyzed at which call sites.
The default is to ignore these directives. They are enabled when any inlining (IPA) switch is given on the command line. When -inline_manual (-ipa_manual) is included on the command line, the !*$* inline (!*$* ipa) directives are enabled without enabling the automatic inlining algorithms. Because !*$* [no]inline and !*$* [no]ipa override the -inline=, -ipa=, -inline_depth, and -looplevel command-line switches, you can use them along with command-line control to select routines or call sites that the regular selection algorithm would reject or to prevent specific routines or CALL sites from being inlined/analyzed.
For more information about the
!*$* inline
and
!*$* ipa
directives, see Chapter 5, KAP Directives and Chapter 7, Inlining and IPA.
4.6.11 -inline_optimize, (-inline_optimize=0), -ipa_optimize, (-ipa_optimize=0)
The switches -inline_optimize and -ipa_optimize help you to optimize large programs by causing KAP to set other switches depending on the value you specify.
The possible values are:
The following sections describe command-line switches that the advanced user may want to use for maximum performance.
Some of these switches (
-aggressive
,
-cacheline
,
-cachesize
,
-dpregisters
,
-fpregisters, -setassociativity
) set parameters that KAP uses to optimize memory usage. Knowing how
much data can be kept in fast memory (cache or arithmetic registers)
and the costs of moving data in the memory hierarchy, enable better
optimization of memory reference patterns. The
-scalaropt=3
and
-roundoff=3
switches are required for memory management to be enabled.
4.7.1 -aggressive, -ag, -nag, (-noaggressive)
The -aggressive switch takes a list of options as follows:
To explicitly disable these options, specify /noaggressive .
See also the
-cacheline
,
-cachesize
, and
-setassociativity
command-line switches.
4.7.2 -arclimit, -arclm, (-arclimit=5000)
The -arclimit switch sets the size of the dependence arc data structure that KAP uses to perform data dependence analysis (see Appendix B, Data Dependence Analysis).
This data structure is dynamically allocated on a loop-nest-by-loop-nest basis. The formula that is used to estimate the number of dependence arcs for a given loop nest is:
dependence_array_size=max(#_of_statements * 4, arclimit value) |
This is an estimate, because KAP is assuming that each statement, in the worst case, would have four dependence arcs.
If a loop contains too many dependence relationships and cannot be represented in the dependence data structure, KAP will give up optimization of the loop. Loops that exceed this threshold are marked in the Loop Table ( -listoptions=l ) in the listing file. (See Chapter 9, KAP Listing File.)
You can use the -arclimit switch to increase the size of the data structure to enable KAP to perform more optimizations. Reducing the -arclimit value will (slightly) reduce the size of the KAP executable, while reducing the complexity of loops that KAP can analyze. (Most users will not need to change this value.)
The maximum value is 5000. If a larger value is specified, and the "KAP switches" ( -listoptions=k ) section is enabled, the entry for arclimit is -arclimit override=5000 . The value will be set to 5000.
The dependence arc data structure size can also be modified with the
!*$* arclimit <integer>
directive.
4.7.3 -cacheline, -chl, (-chl=64,64)
The -cacheline switch informs KAP of the width of the memory channel in bytes between cache and main memory.
When two arguments are specified, the first argument gives the width of
the memory channel between the primary cache and the secondary cache,
and the second argument gives the width of the memory channel between
the secondary cache and main memory. Omitting the second argument, or
specifying it as 64 (the default), tells KAP not to optimize secondary
cache usage.
4.7.4 -cache_prefetch_line_count, -cplc, (-cplc=0)
The
-cache_prefetch_line_count
switch gives the number of additional lines prefetched into the cache
during a cache miss.
4.7.5 -cachesize, -chs, (-chs=32,0)
The -cachesize switch informs KAP of the size in kilobytes of the cache memory.
When two arguments are specified, the first argument gives the size of the primary cache and the second argument gives the size of the secondary cache. Omitting the second argument, or specifying it as 0 (the default), tells KAP not to optimize secondary cache usage.
When
-tune=ev6
, the default values for
-chs
are 32,0.
4.7.6 -dpregisters, -dpr, (-dpr=32)
The
-dpregisters
switch specifies the number of DOUBLE PRECISION registers each
processor has.
4.7.7 -each_invariant_if_growth, -eiifg, (-eiifg=20)
When a loop contains an IF statement whose condition does not change from one iteration to another, "invariant," the same test must be repeated for every iteration. The code can often be made more efficient by "floating" the IF outside the loop and putting the THEN and ELSE sections into their own loops.
This gets more complicated when there is other code in the loop, because a copy of it must be included in both the THEN and ELSE loops.
For example:
DO I = ... section-1 IF ( ) THEN section-2 ELSE section-3 ENDIF section-4 ENDDO |
Becomes:
IF ( ) THEN DO I = ... section-1 section-2 section-4 ENDDO ELSE DO I = ... section-1 section-3 section-4 ENDDO ENDIF |
When sections 1 and 4 are large, the extra code generated can slow a program down through cache contention, extra paging, and so on, more than the reduced number of IF tests speed it up.
The -each_invariant_if_growth switch provides a maximum number of lines of executable code of sections 1 and 4 below which KAP tries to float an invariant IF outside a loop.
This can be controlled on a loop-by-loop basis with the !*$* each_invariant_if_growth directive (see Chapter 5, KAP Directives).
The total amount of additional code generated in a program unit through invariant IF floating can be limited with the -max_invariant_if_growth switch.
The allowed values for the
-each_invariant_if_growth
switch are 0 to 5000.
4.7.8 -fpregisters, -fpr, (-fpr=32)
The
-fpregisters
switch specifies the number of single-precision registers, such as
ordinary floating-point, that each processor has.
4.7.9 -fuse, -nfuse, (-nofuse)
The
-fuse
switch tells KAP to perform loop fusion. Loop fusion is a conventional
compiler optimization that transforms two adjacent loops into a single
loop. Data dependence tests allow fusion of more loops than standard
techniques allow. Before KAP can perform loop fusion, you must specify
either the
-scalaropt=2
or
-optimize=5
switch.
4.7.10 -fuselevel, (-fuselevel=0)
The -fuselevel switch further controls the level of loop fusion. Whenever you set -fuselevel , KAP automatically sets -fuse .
The possible options for this switch are:
KAP automatically sets the -generateh switch for you. Compaq recommends that you not set this switch.
This switch is used for forward declarations. KAP needs two passes to
resolve Fortran 90 forward declarations. The first pass, the
"generateh" pass, builds the information needed to analyze
the program for forward references. The second pass, the
"useh" pass, resolves any forward references.
4.7.12 -hdir, -hd, (-hdir=<current directory>)
KAP automatically sets the -hdir switch for you. Compaq recommends that you not set this switch.
The
-hdir=directoryname
switch specifies the name of the directory where the KAP
"generateh" pass stores the temporary files containing
information about forward references. The
-useh
switch picks up the information from that directory.
4.7.13 -heaplimit, -heap, (-heaplimit=100)
KAP may require large amounts of memory in order to process your source code. The -heaplimit option specifies the maximum size in megabytes that the KAP heap can grow. If this limit is reached, KAP will stop processing your source code and try to exit with an "out of memory" error message.
If you choose a
-heaplimit
setting that is greater than the amount of memory that your machine has
available, KAP may run out of memory before it reaches the
-heaplimit
. KAP relies upon the operating system to tell it that the process has
run out of memory before that problem occurs. Using
-heaplimit
makes a graceful exit more likely.
4.7.14 -hoist_loop_invariants, -hli, (-hli=1)
The -hoist_loop_invariants switch controls code hoisting of loop-invariant expressions from loops. Note that this switch is independent of the switches -each_invariant_if_growth and -max_invariant_if_growth , which control the floating of invariant-IFs out of loops.
The possible options for this switch are:
If there is invariant code that is protected by an IF-structure and the hoisting value is less than 3, then KAP generates the following message in the output listing:
An invariant expression not hoisted because -hoist_loop_invariants < 3 |
The -interleave switch controls loop unrolling and rescheduling. Interleaved unrolling can help the compiler recognize quadword loads and stores, which are more efficient than ordinary loads and stores. It does this by first unrolling the loop as in ordinary loop unrolling. Second, the statements in the loop are interchanged where possible to make references to the same array adjacent to each other.
The following example demonstrates interleaved unrolling:
REAL A(100),B(100) DO I = 1, 100 A(i) = 99. B(i) = 100. ENDDO PRINT *,a,b END |
The output from KAP with interleaved unrolling turned on is as follows:
REAL A(100), B(100) DO I=1,97,4 A(I) = 99. A(I+1) = 99. A(I+2) = 99. A(I+3) = 99. B(I) = 100. B(I+1) = 100. B(I+2) = 100. B(I+3) = 100. ENDDO PRINT *, A, B END |
The -library_calls=<name> switch directs KAP to replace sections of code with calls to standard numerical library routines that have the same functionality. This can simplify the source code, and if a version of the library that has been highly tuned for the target machine is available, the use of the standard package will improve performance of the application program. For example, if you specify this switch and you link the application with the Compaq Extended Math Library (CXML), calls to the CXML Basic Linear Algebra Subroutines (BLAS) will replace sections of code.
Use the following command:
kf90 -fkapargs='-lc=blas23' -ldxml myprog.f90 |
The argument for -library_calls identifies which library to create CALLs for. The CXML BLAS libraries are BLAS1, which performs vector-vector operations such as dot product; BLAS2, which performs matrix-vector operations such as matrix vector multiplication; and BLAS3, which performs matrix-matrix multiplication.
To use both BLAS1 and BLAS2, specify BLAS12.
To use both BLAS2 and BLAS3, specify BLAS23; this is the recommended switch.
Specifying BLAS is equivalent to specifying BLAS23.
This switch can be disabled within a section of code with the !*$* optimize=0 directive. See Chapter 5, KAP Directives.
This switch is disabled if -roundoff=0 .
This switch will introduce calls to BLAS routines to be linked from system libraries. Use of this switch can cause a collision between KAP-generated BLAS routine names and user-provided routines in the source code. Even if the user-provided routines are identical in function to the library routines, rename or remove the user routines, because the linker will not use the optimized library routines if the user's calls to routines can be satisfied with the user-provided routines. |
To reduce compilation time, KAP estimates how long it spends analyzing each loop nest construct. If a loop is too deeply nested, KAP ignores the outer loop and recursively visits the inner loops. The loop nest limit is a rough dial to control what KAP considers too deeply nested.
Loops that exceeded this threshold will be marked in the Loop Table ( -listoptions=l ) in the listing file. (See Chapter 9, KAP Listing File.)
Larger loop nest limits may allow more optimizations to be performed
for deeply nested loop structures, but might take more compilation
time. The limit does not correspond to the DO loop nest level; rather,
it is an estimate of the number of loop orderings that can be generated
from a loop nest. The
-limit
switch resets this internal limit. The loop nest limit can also be
modified with the
!*$* limit <integer>
directive. Most users do NOT need to change this value.
4.7.18 -machine, -ma, -nomachine, -noma, (-ma=s)
The -ma switch provides a way to specify the characteristics of the system your program will run on.
You can use any switch combination with the exception of -ma=ns . Options are:
To disable all of the switch settings, enter
-nomachine
on the command line.
4.7.19 -max_invariant_if_growth, -miifg, (-miifg=500)
When a loop contains an IF statement whose condition does not change from one iteration to another (that is, loop invariant), the same test must be repeated for every iteration. The code can often be made more efficient by floating the IF outside the loop and putting the THEN and ELSE sections into their own loops.
This gets more complicated when there is other code in the loop, because a copy of it must be included in both the THEN and ELSE loops. The -max_invariant_if_growth switch allows you to limit the total number of additional lines of code generated in each program unit through "invariant-IF restructuring."
The -miifg setting is the maximum number of lines to which a program unit may grow due to invariant-IF floating. If restructuring a loop with invariant IFs would cause the size of the program unit to exceed this limit, the restructuring will not be performed. For example, if -miifg=500 and the original size of a subroutine was 450 noncomment lines, then at most 50 additional lines can be added by invariant-IF floating. Because other KAP transformations can add or delete lines, the number of lines actually added by invariant-IF floating and the final size of a program unit may differ from what the -miifg value alone would cause.
This can be controlled on a loop-by-loop basis with the !*$* max_invariant_if_growth directive (see Chapter 5, KAP Directives). The maximum amount of additional code generated in a single loop through invariant-IF floating can be limited with the -each_invariant_if_growth switch.
The allowed values for the
-max_invariant_if_growth
switch are 0 to 50000.
4.7.20 -routine, -rt, (off)
The -routine switch allows you to specify switches that apply only to specific routines within the source file KAP processes. The only switches that -routine can specify are as follows:
-each_invariant_if_growth
-max_invariant_if_growth
-optimize
-roundoff
-scalaropt
-skip
-unroll
-unroll2
-unroll3
The syntax of a KAP command using the -routine switch is as follows:
kapf90 [-<switches>] source_file.f \ -routine=<routine_name>[,<routine_name>...]<switches_for_routine_names> ... |
Place the -routine switch after the name for the Fortran 90 source file. Specify switches that apply to all routines in the source file after kapf90 . The <routine_name> argument must be a routine in source_file.f.
For example, the kf90 command to exclude optimizations for routine sub1 of myprog.f90 is:
kf90 myprog.f90 -fkapargs='-routine=sub1 -skip' |
As another example, consider the following kapf90 command line:
kapf90 -scalaropt program.f90 -routine=sub_1 \ -roundoff -optimize -freeformat |
This command invokes KAP and passes the -scalaropt switch to all program units in file program.f90 including sub_1 . Program unit sub_1 processes with both the -roundoff and -optimize switches. The -freeformat switch applies to both file program.f90 and program unit sub_1 .
Using the -routine switch implies that directives equivalent to the specified switches are asserted only while processing particular routines. The effect is the same as if the implied directives were inserted at the top of the associated routines.
Using the -routine switch makes the resulting kapf90 command contain two halves. The first half looks like any other kapf90 command because it contains kapf90 , switches different from -routine , and a source file name. The second half of the command is different because it contains one or more -routine switches, each with associated routines and switches for the routines selected from the preceding bulleted list.
For example, consider the following command line:
kapf90 -cachesize=8,0 -syntax=a -freeformat my_program.f90 \ -routine=sub_1,sub_2,sub_3 -roundoff -optimize -routine=sub_4 -unroll |
An explanation of the two halves follows:
The usual rules for shortening the names of switches also apply to the
-routine
switch. For example, the following KAP command fragments produce
identical results:
-routine=subroutine_a -optimize -unroll
-routine=subroutine_a -opt -unr
4.7.21 -setassociativity, -sasc, (-sasc=1,1)
The -setassociativity switch provides information on the mapping of physical addresses in main memory to cache pages in the Level 1 and Level 2 cache.
The first integer describes the set associativity of the Level 1 cache. The second integer describes the set associativity of the Level 2 cache.
A setting of n means that a page can appear in any of
n places in the cache. For instance, a setting of 1 means that
a page in main memory can be placed in only one place on the cache. If
the cache page is already in use, its contents will have to be
rewritten or flushed in order to copy the newly accessed page into the
cache.
4.7.22 -srlcd, -nsrlcd, (-nosrlcd)
The -srlcd switch tells KAP to remove loop-carried dependencies. SRLCD is an acronym for Scalar Replacement of Loop Carried Dependencies. KAP holds in temporary storage array values read or written across multiple loop iterations. Faster temporary/register accesses replace slower memory accesses in the loop body.
Before KAP can remove loop-carried dependencies, you must specify the
switch
-scalaropt=n
, where n is greater than or equal to 2.
4.7.23 -tablesize, -ts, (-ts=24000000)
The value specified in the -tablesize switch is compared to the mathematical product of the number of statements and the number of variables referenced in a given program unit. When this product is greater than the tablesize value, a "program-too-large" message is issued stating the required tablesize.
Note that you should review your process resource limits with the C
shell
limit
command before adjusting the
-tablesize
switch. Use the
unlimit
command or, for example, a command such as
limit stacksize 32768
to increase all, or specific, resource limits.
4.7.24 -unroll, -ur, (-ur=4), -unroll2, -ur2, (-ur2=160), -unroll3, -ur3, (-ur3=1)
The -unroll, -unroll2, and -unroll3 switches control how KAP unrolls inner loops.
Loop execution is often more efficient when the loops are unrolled. KAP unrolls the loop until either the loop has been unrolled the number of times given in the -unroll switch, or the amount of "work" in each iteration reaches the value given by the -unroll2 switch.
The switch -ur=0 means to use default values to unroll.
The switch -ur=1 means no unrolling.
The unroll2=n switch sets the upper limit for unrolling. If the estimate of work is greater than n, then the loop will not be unrolled.
The default, n=160 , means a maximum work of 160 in an unrolled iteration. It means that a work of 150 also results in an unrolled iteration while a work of 170 results in no unrolling.
Work is estimated by counting operands and operators in a loop. The amount of work in each loop iteration is shown in the loop table in the annotated listing.
The unroll3=n switch sets the lower limit for unrolling. If the estimate of work is less than n, then the loop will not be unrolled.
The default, n=1 , means a minimum work of 1 in an unrolled iteration. If you choose a higher value, such as 20, it would mean that a work of 30 also results in an unrolled iteration while a work of 10 results in no unrolling.
The -scalaropt=2 switch is required to enable loop unrolling.
If you use kapf90 with the Compaq Fortran compiler optimization switch set to -O5 , you should turn off loop unrolling by setting -unroll=1 . |
Outer loop unrolling is a part of memory management and is not controlled by these switches.
There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. No warning is given if you request more than 100 unrolled iterations.
Loop overhead is reduced by performing more iterations from the original loop for each pass through the new loop, but the gain is less with each additional unrolled iteration. Eventually, the cost in extra memory exceeds the gain from unrolling. The -unroll switch sets a maximum number of iterations to unroll.
When the total number of iterations to be executed by the loop (the iteration count) is constant, KAP searches for a number of iterations to unroll that is near the -unroll value and which exactly divides the iteration count. This avoids having extra iterations left over, which must be handled separately and generate extra code. The range over which KAP searches for an exact divisor is the -unroll value plus or minus 25%. |
To use the "work per unrolled iteration" limit, KAP analyzes a given loop by computing an estimate of the computational work that is inside the loop for ONE iteration. This rough estimate is based on the following criteria:
# of assignments +
# of IF statements +
# of subscripts +
# of arithmetic operations
For the following example, the user has specified 8 for the maximum number of iterations to unroll ( -unroll=8 ) and 100 for the maximum "work per unrolled iteration" ( -unroll2=100 ):
DO 10 I = 2,N A(I) = B(I)/A(I-1) 10 CONTINUE |
This example has:
1 assignment
0 ifs
3 subscripts
2 arithmetic operators
-------------------------
6 is the weighted sum (The work for 1 iteration)
This weighted sum is then divided into 100 to give a potential unrolling factor of 16. However, because the user has also specified 8 for the maximum number of unrolled iterations, KAP takes the minimum of the 8 and 16. Therefore, KAP will unroll only 8 iterations. The maximum number of iterations that KAP will unroll is 100. If the user requests more than that, NO warning will be given.
In this case (an unknown number of iterations), KAP will generate two loops --- the primary unrolled loop and a cleanup loop to ensure that the number of iterations in the main loop is a multiple of the unrolling factor. The result is the following:
DO 11 I=2,N-7,8 A(I) = B(I) / A(I-1) A(I+1) = B(I+1) / A(I) A(I+2) = B(I+2) / A(I+1) A(I+3) = B(I+3) / A(I+2) A(I+4) = B(I+4) / A(I+3) A(I+5) = B(I+5) / A(I+4) A(I+6) = B(I+6) / A(I+5) A(I+7) = B(I+7) / A(I+6) 11 CONTINUE DO 2 I=I,N,1 A(I) = B(I) / A(I-1) 2 CONTINUE |
Additional examples are in Chapter 8, Transformations.
4.7.25 -useh
KAP automatically sets the -useh switch for you. Compaq recommends that you not set this switch.
This switch is used for forward declarations. KAP needs two passes to
resolve Fortran 90 forward declarations. The first pass, the
"generateh" pass, builds the information needed to analyze
the program for forward references. The second pass, the
"useh" pass, resolves any forward references.
4.8 Directive Recognition Switches for kapf90
The following section explains the function of each directive
recognition switch.
4.8.1 -directives, -dr, -nodirectives, -ndr, (-directives=akpv)
The -directives switch tells KAP which directives to accept. KAP directives and assertions use the following syntax:
!*$* key word(s)
!*$* key word(s) (argument)
!*$* assert key word(s) (argument)
The -directives options are:
a --- KAP assertions are accepted.
k --- KAP !*$* or *$* directives are accepted.
p --- Parallel directives are accepted. The actual selection of a set of directives is made when the kf90 or kapf90 command is given. For example, the switch -dr=p tells KAP to accept parallel directives, and the accompanying switch -psyntax=openmp tells KAP to process OpenMP parallel directives.
v --- VAST CVD$ or !VD$ directives.
To disable all of the choices, enter -nodirectives on the command line.
KAP assertions are similar in form to directives, but they assert program characteristics that KAP may use in its optimizations. (See Chapter 6, KAP Assertions.) The acceptance of assertions can also be controlled with the !*$* assertions and !*$* noassertions directives.
The KAP directives are described in Chapter 5, KAP Directives.
The parallel directives are described in Chapter 3, KAP Parallel Processing.
4.8.2 -ignoreoptions, -ig, -nig, (-noignoreoptions)
The
-ignoreoptions
switch directs KAP to ignore any
!*$*options
or
*$*options
lines at the beginning of a file. Normally, KAP reads the
!*$*options
or
*$*options
instruction for further command-line switches.
4.9 Input-Output Switches for kapf90
The following sections explain the function of each switch that affects
KAP input-output file selection. See Chapter 2 for more details
about file name defaults and special cases.
4.9.1 -cmp, (<file>.cmp.f90), (<file>.cmp.f), -nocmp, -ncmp
The -cmp switch causes KAP to save the optimized source program under the file name of your choice.
By default, kf90 names the optimized source the same name as the input file with an extension determined by the input file extension, as follows:
File Extension of Input | File Extension of Transformed Source |
---|---|
.f90 | .cmp.f90 |
.f, .for, .FOR | .cmp.f |
The kapf90 default is to name the optimized source the same name as the input file with the extension .cmp , regardless of the input file extension. Because the Compaq Fortran compiler will not process a file with the default .cmp extension, you should override the default. For example, use the -cmp switch in the kapf90 command line to rename the optimized source file_name.cmp.f90 . See the examples in Section 2.8, Compiling a Program Using kapf90.
Both
kf90
and
kapf90
place the optimized source file in the current directory. To disable
generation of the optimized Fortran 90 output file, enter
-nocmp
on the command line.
4.9.2 -include, -inc, (off)
The
-include=<pathname>
switch allows you to specify an alternate directory for locating the
files specified in INCLUDE statements. An include file whose name does
not begin with a slash (/) is sought first in the directory containing
the file that contains the INCLUDE statement or directive, then in the
directory named in the
-include
switch.
4.9.3 -list, -l, -nl, (-list=<file>.out)
The -list=<filename> switch provides a way to name the generated annotated listing file.
Specifying -list with no file name will cause the listing file to be written to <file>.out , where <file> is the input file name with any trailing extension such as .f stripped off. For example, if the input file is myprog.f , the output file would be myprog.out .
To disable generation of the listing file, enter
-nl
on the command line.
4.10 Listing Switches for kapf90
The following sections explain the function of each listing switch.
4.10.1 -cmpoptions, -cp, -ncp, (-nocmpoptions)
The -cmpoptions switch specifies optional additional information or formatting for a transformed code ( .cmp ) file.
Specifying -cmpoptions=i tells KAP to insert special numbers that reference the original code.
Specifying
-cmpoptions=n
tells KAP to create the transformed code from its internal data
structures. Specifying
-nocmpoptions
tells KAP to use lines from the source file, where feasible. Using the
internal data structures for the code provides consistent indentation
and formatting, but also gives all new labels and other changes from
the source code. This may make relating source and transformed code
more difficult.
4.10.2 -lines, -ln, (-ln=55)
The listing generated by KAP is paginated for printing on a line printer. You can change the number of lines per page on the listing by using the -lines=<number> switch.
The
-lines=0
settings tells KAP to paginate at subroutine boundaries.
4.10.3 -listingwidth, -lw, (-lw=132)
The -listingwidth=<integer> switch sets the maximum line length for the listing file produced by KAP. This setting affects the format of the loop summary table ( -listoptions=l ) and KAP options table ( -listoptions=k ).
The fixed setting, 132, is optimal for most line printers. No other
values are allowed.
4.10.4 -listoptions, -lo, (-lo=klo)
The -listoptions=<value> switch tells KAP what information to include in the listing and error files.
You can select any of the following information:
Value | Prints |
---|---|
c | Calling tree at the end of the program listing |
k | KAP switches used at the end of each program unit |
l | Loop-by-loop optimization table |
n | Program unit names, as processed, written to the standard error file |
o | Annotated listing of the original program |
p | Performance statistics |
s | Summary of the optimizations performed |
t | Annotated listing of the transformed program |
The transformed program displayed by t is recorded in the transformed code file regardless of whether you request -listoptions=t .
To disable all of the previous switches and produce no listing file, include -nolist on the command line.
See Chapter 9, KAP Listing File for examples of the different types of KAP listing
output.
4.10.5 -suppress, -su, (off)
KAP produces several types of messages that range from syntax warning and error messages to messages about the optimizations performed.
The -suppress switch allows you to selectively disable the printing of types of KAP messages.
Possible values for the switch are:
The default tells KAP to list all message types.
See Appendix F, Listing File Messages for examples of KAP listing messages.
4.11 !*$*options
When you determine that a file must always be run with the same command-line switches, you can use the first line of the file to specify these switches. The format of this line is as follows:
!*$*options -switch[=value] [-switch[=value]]... |
The !*$*options (or C*$*options or *$*options ) must appear in columns 1 through 11 (or 1 through 10) with a character space between this command and the options that follow, such as:
!*$*options -lines=60 -optimize=4 -scalaropt=2 |
Only the first line can be used for !*$*options . The syntax forms C*$*options and *$*options can only be used in Fortran source files that use fixed source form. Short or long switch names can be used on this line. The leading C, the keyword options, and the command-line switches can be uppercase or lowercase.
Switches of the form -switch=<name> (for example, -fortran or -inline ) cannot be specified on the !*$*options line of the source file. These switches can be specified on the command line only.
If conflicting switches are specified on the command line and on the !*$*options line, the !*$*options line takes precedence. If additional options are specified on the !*$*options line, these are used in addition to those specified on the command line. If the command-line switch -ignoreoptions is set, the !*$*options line in the program is treated as a comment.
Previous | Next | Contents | Index |