Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index


Chapter 2
How to Run KAP C

This chapter gives general information about KAP syntax, file-naming conventions, and optimizing programs with KAP C. Sections describing how to compile and run programs with KAP C are specific to single-processor Tru64 UNIX systems. For information on how to compile and run programs with KAP C on multiprocessor systems, see Chapter 3.

Compaq KAP C/OpenMP for Tru64 UNIX can be run in either of two modes:

2.1 Using KAP C

The following command-line switches provide maximum general optimization: -optimize=5, -roundoff=3, and -scalaropt=3 . These are the default settings for these switches.

See Section 2.3 for an explanation of how to pass command-line switches to KAP C and Chapter 4 for details about what is controlled by each switch's levels.

2.2 Installing KAP C

KAP C is installed on the Tru64 UNIX system with the system command setld . See the Compaq KAP C/OpenMP for Tru64 UNIX Installation Guide for details.

The directory that contains the executable KAP C file must be in your path, or a command such as kapc must be aliased to point to the executable file.

If you are a frequent user of KAP C, you may find it convenient to add an alias to your .cshrc files so that a kapc command with preferred switches is available. Contact your system manager for the directory location of the executable KAP C file.

2.3 Compiling a Program Using the kcc Driver

The kcc command invokes a driver program that automatically calls KAP C, the C compiler, and the linker. The examples below show how to use kcc .

2.3.1 Using kcc with the Default KAP and C Compiler Switch Settings

To use kcc to compile myprog.c with the default KAP and C compiler switch settings, enter the command:


> kcc myprog.c 

The kcc command uses the KAP preprocessor on myprog.c , compiles the result with the Compaq C compiler, links the object code into an executable image, and produces the following files:

The kcc command sets the following C compiler switches by default:

-D__KAP --- defines the name __KAP .
-U_INLINE_INTRINSICS --- stops the compiler from inlining intrinsic functions. KAP C currently does not support the inlining of intrinsic functions by the compiler.
-tune host --- determines whether the host architecture is ev4 , ev5 , or ev6 and adjusts the compiler accordingly.
-non_shared --- instructs the linker to use the archive libraries.
-fast --- provides a single method for turning on a collection of compiler optimizations.

To see a list of the KAP C switches and Compaq C compiler switches passed by kcc , use the -verbose switch as follows:


> kcc myprog.c -v 

An example of the output follows:


oursmp>kcc -v sin.c timing.o -lm 
/usr/bin/cc -C -E -D__KAP -U_INLINE_INTRINSICS sin.c  > ./ktmpa24807.c 
/usr/bin/kapc -cmp=./sin.cmp.c -nolist ./ktmpa24807.c   -tune=EV4 
KAP/Tru64_U_C   4.2 k010737s 010515           21-Aug-2001   16:48:52 
0 errors in file ./ktmpa24807.c 
/usr/bin/cc -migrate -fast -v ./sin.cmp.c timing.o -lm -tune host 
-non_shared -lkio 
 
 /usr/lib/cmplrs/cc.dtk/gemc_cc -D__LANGUAGE_C__ -D__unix__ -D__osf__ 
 -D__alpha -D_SYSTYPE_BSD -D_LONGLONG -D__digital__ -D__arch64__ 
 -g0 
 -O3 -std -intrinsics -ansi_alias -ansi_args -assume nomath_errno 
 -assume trusted_short_alignment -D_INTRINSICS -D_INLINE_INTRINSICS 
 -D_FASTMATH -float -fp_reorder -ifo -readonly_strings -v -tune ev4 
 -I/usr/include.dtk -o /tmp/deccobjAAAaayhca ./sin.cmp.c 
 
 These macros are in effect at the start of the compilation. 
 ----- ------ --- -- ------ -- --- ----- -- --- ------------ 
 
  -D__DECC -D__osf__ -D__arch64__ -D__PRAGMA_ENVIRONMENT -D_LONGLONG 
  -D__digital__ -D__X_FLOAT=0  -D__DATE__="Aug  21 2001" 
  -D__DECC_MODE_RELAXED -D__DECC_VER=60160104  -D_SYSTYPE_BSD 
  -D__ALPHA -D__IEEE_FLOAT -D_FASTMATH -D__unix__ 
  -D_INLINE_INTRINSICS -D__TIME__="16:48:53"  -D__Alpha_AXP 
  -D__INITIAL_POINTER_SIZE=0 -D_INTRINSICS -D__STDC__=0 
  -D__LANGUAGE_C__ -D__alpha 
  /usr/lib/cmplrs/cc.dtk/gemc_cc: 
  0.18u 0.07s 0:00 92% 0+9k 0+11io 0pf+0w 9stk+1328mem 
 
  /usr/lib/cmplrs/cc.dtk/ld -g0 -O3 /usr/lib/cmplrs/cc.dtk/crt0.o 
  /tmp/deccobjAAAaayhca timing.o -lm -non_shared -lkio -qlots -lc 
  /usr/lib/cmplrs/cc.dtk/ld: 
  0.15u 0.13s 0:00 23% 0+13k 45+58io 14pf+0w 13stk+2024mem 
  rm ./ktmpa24807.c 
  mv sin.cmp.o sin.o 

2.3.2 Passing KAP C Switches to kcc

The -ckapargs switch specifies one or more KAP C command-line switches to the KAP C preprocessor. For example, to use kcc to optimize and compile the file myprog.c using KAP C switches for general optimization, enter the command:


kcc -ckapargs='-optimize=5 -roundoff=3 -scalaropt=3 \
-list=myprog_annotated.lis' myprog.c 

The following files result:

For descriptions of all KAP C command-line switches, see Chapter 4.

2.3.3 Passing Compaq C Compiler Switches to kcc

Any command-line switch that is valid for the C compiler or the linker is valid for the kcc command. You can specify compiler switches, linker switches, and KAP C switches on the same line. For example, to optimize and compile the file myprog.c using KAP C switches for general optimization and to specify the name of the executable file with the Compaq C compiler switch -o , enter the following command:


> kcc -ckapargs='-optimize=5 -roundoff=3 -scalaropt=3' \
  -o myprog.exe myprog.c 

The following files result:

The kcc command specifies the compiler switches -fast and -tune host , and the linker switch -non_shared by default. The -non_shared switch causes the image to be linked with archive libraries instead of with shared libraries. To override the -non_shared default, specify -call_shared on the command line, for example:


> kcc -call_shared myprog.c 

2.4 Saving Optimized Source Programs

The kcc command saves the optimized version of your source program followed by the extension .cmp.c in the source directory for use in debugging and profiling. You can override the KAP C default of adding .cmp.c to the source program name with the -cmp switch. See Section 4.6.1 for more information.

2.5 KAP C Command-Line Switches Determined by Compiler Switches

Some C compiler switches automatically set KAP C command switches or alter the default switch settings.

Explicitly calling the compiler switch -std1 causes KAP C to be called with the command-line switch -syntax=a .

Explicitly calling the compiler switch -standard=common causes KAP C to be called with the command-line switch -syntax=k .

Calling the C compiler without any compiler switches or with the compiler switch -std0 causes KAP C to be called with the command-line switch -syntax=d .

2.6 Invoking the C Preprocessor

The kcc command invokes the C preprocessor before transforming a file with KAP C. The name __KAP is defined for the C preprocessor in order to include alternate code in some system include files. Commands with the preprocess-only switch will terminate after the C preprocessor phase.

2.7 Preprocessing a Program Using kapc

For C programs that do not contain C preprocessor commands such as #define, #include, #ifdef, or macro definitions, use the following command to execute KAP C as a standalone preprocessor:


kapc [kap_switch_string] myprog.c -cmp=myprog.cmp.c 

The [kap_switch_string] is a list of one or more KAP C command-line switches, for example:


kapc -inm -roundoff=2 myprog.c -cmp=myprog.cmp.c 

For C programs that contain C preprocessor commands, the input file must first be preprocessed with the C compiler. Use the following commands:


cc -P -D__KAP -U_INLINE_INTRINSICS myprog.c 
kapc -cmp=myprog.cmp.c myprog.i 

After preprocessing your program, give the optimized source file myprog.cmp.c to the compiler, as follows:


cc -migrate myprog.cmp.c -tune host -lkio -fast 

For an explanation of the -tune host , -lkio , -fast , and -U_INLINE_INTRINSICS switches, see Section 2.3.

Note

When you use kapc to process a file, you must set the Compaq C compiler and linker switches appropriately. For this reason, Compaq recommends that you use kcc whenever possible, because kcc automatically sets the compiler and linker switches correctly.

2.8 Using KAP C Syntax

Specify switches in lowercase with the syntax -switch[=value] . Do not leave spaces between the switch name and the value. Switches can appear before or after the input file as follows:


kapc -inm  myprog.c -roundoff=2 

KAP C recognizes standard abbreviations for switches. Switches that take a list of names must have the names separated by commas and without spaces, for example:


-inff=file1.c,file2.c 

Enclose KAP C command-line switches passed through kcc by using the -ckapargs switch with single quotation marks, as follows:


> kcc -ckapargs='-optimize=5 -roundoff=3 -scalaropt=3' -list myprog.c 

C compiler switches, for example, -list , do not require quotation marks.

2.9 Using File Naming Conventions

Any input file name is acceptable. Output file names are written to stdout unless the -cmp command-line switch is specified.

When KAP C detects an error condition, it writes a message to standard error.

2.10 Guidelines for Optimizing With KAP

This section describes how you can get maximum performance in your application programs in the shortest time.

This information can be used with both multiprocessor and single-processor systems, and with both Fortran and C versions of all KAP products. Therefore, the information may contain references to command-line switches or settings that are unavailable or are different from those in the KAP that you are using.

This section provides separate protocols for small and large programs. Small programs are defined as those that can be compiled and run quickly. Because the cost of each iteration is small, you can take risks. The information presented here further assumes that small programs have a small number of program units.

Large programs are defined as those that take more time to compile and run than it takes for you to check the results. A program can be large either because the source code is very large or because the execution time is long.

2.10.1 Optimizing Small Programs with KAP

Follow these guidelines to optimize small programs:

  1. Compile the program without KAP, with minimum compiler optimization, and with all compiler run-time checks enabled. Note the execution time and verify the results. If the program fails at this step, there is little optimization you can do.
    Some older programs use standard-violating techniques that KAP will not transform safely. If KAP fails because of this problem, you cannot do much optimization.
    If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action may not be feasible for handling problems in large programs, but it might work for isolated portability problems.
    If the problem code is isolated and runs without KAP optimization, you may be able to use KAP on the rest of the program and leave out any problematic sections.
    You can also refer to Section 2.13. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
  2. Compile without KAP but with maximum compiler optimization. Note the execution time and verify the results. If the program fails, reduce compiler optimization and try again.
  3. Compile the fastest or best run without using KAP and run it again with profiling enabled (for example, gprof ) to identify the program units that take the most time to run.
    If some time-intensive units have many iterative loops and arrays, then those units are good candidates for KAP loop optimizations. Go to step 4.
    If the units are not good candidates, lower-payoff optimizations, such as inlining, may provide some performance improvement, especially where inlining inside loop nests may also allow KAP to perform vectorization optimizations. If this is the case, go to step 6.
  4. If the program compiles with minimum compiler optimization enabled, then turn on all optimization except inlining by invoking -optimize=4 and full compiler optimization.
  5. If step 2 succeeds and the results are correct, try the suggestions described in the Section 2.12.
  6. If step 2 fails, try reducing one optimization at a time ( -roundoff=0 , -scalaropt=1, -optimize=3, and any compiler optimizations) until the program runs correctly. Use the -lo=k switch setting to create a listing of the KAP command-line switches and settings.

2.10.2 Optimizing Large Programs with KAP

Follow these guidelines to optimize large programs:

  1. Compile the program without KAP, with minimum compiler optimization, and with all compiler run-time checks enabled. Note the execution time and verify the results. If the program fails at this step, there is not much optimization you can do.
    Some older programs use standard-violating techniques that KAP will not transform safely. If KAP fails because of this problem, there is little optimization you can do.
    If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action is feasible for large programs only if the problems are easily understood and isolated or if you have enough time to find more intractable problems.
    If the problem code is isolated and runs without KAP optimization, you may be able to run KAP on the rest of the program and leave out any problematic sections.
    You can also refer to Section 2.13. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
  2. Compile without KAP but with maximum compiler optimization, note the execution time, and verify the results. If the program fails, reduce compiler optimization and try again.
  3. Compile the fastest/best run not using KAP and run it again with profiling enabled (for example, gprof ) to identify the program units that take the most time to run.
    If some time-intensive units have many iterative loops and arrays, then those units are good candidates for KAP loop optimizations. Go to step 4. If not, then the lower-payoff optimizations, such as inlining, may provide some performance improvement, especially if there are places where inlining inside loop nests may also allow KAP to perform vectorization optimizations. Go to step 6.
  4. If time-intensive routines were identified as good candidates above, run KAP on them with modest KAP optimization ( -optimize=2 ), compile the whole program with the other switches used in the best run from step 2, note the execution time, and verify the results.
    If the program fails, try again with the KAP switch -roundoff=0 ; if that works, the failure is probably due to a roundoff-sensitive operation. If it still fails with -roundoff=0 , try -scalaropt=1 .
  5. If step 4 works, repeat with full KAP optimization, with full compiler optimization, and with -roundoff=0 or -scalaropt=1 , if needed. If the program fails, reduce the setting to a lower KAP optimization level or a lower compiler optimization level, and try again.
    If things are still going well after this step, try the suggestions in Section 2.12.
  6. If there are no routines with arrays and loops, run the whole program with -optimize=0 and -inline_and_copy= aaa,bbb,ccc,..., where aaa, bbb, and so on, are the most frequently called routines from the profiling run in step 3.
    If this action succeeds, repeat with -optimize=4 and -inline_and_copy= ... If this action fails, try rerunning with -roundoff=0 or -scalaropt=1 or with fewer routines inlined. See Section 2.13 for an explanation of "binary chop."
    If things are still going well after this step, try the suggestions in the Section 2.12.

2.10.3 General Optimization Tips

2.11 Improving and Customizing KAP Performance

After you have used the KAP protocol for either small or large programs, you can find ways to fine-tune KAP to fit your application.

This section helps you discover which KAP command-line switches, directives, or assertions can be used to try to improve KAP performance for a particular application program. The following is a list of common goals and common program situations that KAP users often have, and it offers suggestions for possible improvements.

Remember that KAP is a tool to optimize Fortran and C code. Like any tool, it performs best when you are familiar with the details of how it works and are able to use its switches correctly and advantageously.

Although KAP default switch settings will achieve performance improvement, you can often achieve greater improvement if you understand and use alternate switch settings. Moreover, you can often insert directives or assertions to achieve improved performance improvement.

See Table 2-1 for details about goals and user actions.

Table 2-1 User Actions for Specific Goals
Goal User Action
Have a more informative listing to help answer your questions Use -lo=kl or other listing switches under -listoptions command-line switch.
Recognize more reductions Increase -roundoff switch setting.
Spend less time optimizing deeply nested loops Reduce -limit and -arclimit or their directives.
Disable inner for -loop unrolling Use -unroll=1 or -scalaropt <2.
Disable outer for -loop unrolling Use -roundoff <3 or -scalaropt <3.
Expand (inline) function calls within for loops Use -inline, -inline_from_files, or -inline_from_libraries . Or, if the goal is to execute the function body concurrently, try -ipa or #pragma _KAP concurrent call .
Inline more routines Increase -inline_depth and
-inline_looplevel . (See also the #pragma _KAP inline directive.)

2.12 Using Additional Performance Improvement Techniques

After you have successfully run KAP on a working program by using either the protocol for small programs or the protocol for large programs, you can try the following procedures to find additional opportunities for optimization within your program:

2.13 Correcting KAP Problems

The following are some problems you may encounter when using KAP and possible fixes and workarounds:


Previous Next Contents Index