Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index


Chapter 7
Inlining and IPA

This chapter presents additional information about the KAP command switches and inline directives used to inline subroutines and functions, or to perform Interprocedural Analysis (IPA).

Inlining is the process of replacing a subroutine CALL or function reference with the text of the routine. This eliminates the overhead of the call, and can assist other optimizations by making relationships between arguments, returned values, and the surrounding code easier to find.

IPA is the process of inspecting called subroutines and functions for information on relationships between arguments, returned values, and global data. IPA can provide many of the benefits of inlining, but without replacing the CALL or function reference.

The rest of this chapter covers the inlining and IPA command-line switches and directives, related command switches, examples of their use, and information about program constructs that inhibit inlining. Inlining and IPA are symmetrical from the command-line standpoint --- there are parallel sets of commands and directives for them. In many places in this chapter the term "inlining" applies to both inlining and IPA.

7.1 Inlining and IPA Command-Line Switches

There are two phases to inlining --- defining the universe of inlinable routines and selecting which routines in that universe to inline or analyze. The -inline_from... and -ipa_from... switches define the universe of inlinable routines. The -inline, -ipa, -..._looplevel, and -inline_depth switches select which of the available routines are to be inlined/analyzed. The -inline_create and -ipa_create switches set up collections of routines for inclusion in later KAP runs.

All of the inlining and IPA command switches are listed in the following sections. The short forms of their names are in brackets.

Note

Many of these switches have arguments that are lists of routine names or file names. The elements of these lists may be separated by either commas or colons. Multiple element lists must be enclosed in parentheses.

7.1.1 inline_from/ipa_from Switches

There are four switches, as follows:


-inline_from_files=<list> [-inff] 
-inline_from_libraries=<list> [-infl] 
-ipa_from_files=<list>  [-ipaff] 
-ipa_from_libraries=<list>  [-ipafl] 

Where <list> is one or more of the following: source file name, library file name, and directory, separated by commas. The default is current source file.

You can distinguish different types of files by their extensions. For example, -inline_from_files=xj.f,yy.f,../mrtn would look for routines in the Fortran 90 source files xj.f and yy.f , and in Fortran 90 source files in the directory ../mrtn . Including the directory ../mrtn in the -..._from_files switches can be thought of as shorthand for the notation ../mrtn/*.f . Do not use wildcard characters in a -inline_from_... list.

The -..._libraries versions of these switches take as their arguments lists of subprogram libraries and directories containing such libraries.

KAP recognizes the type of file from its extension, or lack of one, as follows:

If multiple -inline_from... [-ipa_from...] switches are given, their lists are concatenated to get a bigger universe.

Routine name references are resolved by a search in the order that files appear in -inline_from... or -ipa_from... switches on the command line. Libraries are searched in their original lexical order. Multiple -inline_from... -ipa_from... lists are searched in the order that they appear on the command line.

7.1.2 Library Creation

Use the following switches to create a preprocessed library:


-inline_create=<library name>   [-incr] 
-ipa_create=<library name>      [-ipacr] 

To specify an existing library file to inline from, use the
-inline_from_libraries= or the -ipa_from_libraries= switch.

The default source for routines to put into the library is the current source file. If -inline_from... or -ipa_from... is specified, the routines in the listed files are the ones put into the library. This provides a method to combine or expand libraries --- just include the old library(ies) in an -inline_from_libraries or -ipa_from_libraries switch, along with an -inline_from_files or -ipa_from_files switch giving source files containing any new subroutines and functions.

Routines are included in libraries in the order in which they appear in the input file(s). This is to make sure that if multiple routines with the same name are in the same source file, the one chosen for inlining will be the one that you expect from the algorithm under the -inline_from... switch.

A library created with -inline_create will work for inlining or IPA, because it is just partially reduced source code, but a library made with -ipa_create may not appear in an -inline_from= list. It is flagged with a warning message.

If no library name is given, the name used is file.klib , where file is the input file name with any trailing .f, .for, or .ftn stripped off.

When creating a library, only one -inline_create (-ipa_create) switch may be given. That is, only one library may be created per KAP run. If the library file existed prior to running KAP, it is overwritten.

When -inline_create (-ipa_create) is specified on the command line, no transformed code file will be generated.

See the description of the -inline_from_libraries and -ipa_from_libraries switches for information about using libraries created with these switches.

If no -inline (-ipa) switch is given, the default will be to include all the routines from the inlining universe in the library, if possible. If -inline=<name list> or -ipa=<name list> is specified, only the named routines will be included in the library. See Section 7.5 for a list of conditions that can prevent a routine from being inlined.

An example of inlining from the library created previously is included in Section 7.2.

7.1.3 Naming Specific Routines

The following specifies names of particular routines to inline:


-inline[=name[,name...]]    [-inl=] 
-ipa[=name[,name...]]       [-ipa=] 

The default is all routines in the universe specified by any -inline_from... (-ipa_from...) switches, subject to the -inline_looplevel (-ipa_looplevel) setting.

Inlining and IPA are off by default, that is, if you do not specify inlining (IPA) switches and no inlining (IPA) directives are found in the source code, no inlining (IPA) will take place.

If you omit -inline (-ipa) from the command line, automatic selection of routines to inline is disabled. You can perform manual selection of routines to inline (analyze) with the -inline_manual (-ipa_manual) switches and the inline and IPA directives.

If you specify -inline (-ipa) on the command line without a list of routine names, then all routines in the inlining (IPA) universe are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values.

If you specify -inline (-ipa) on the command line with a list of routine names, then only the routines that are included in the list are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values. The list items can be separated by commas or colons.

The following switches have no versions, but they must have arguments as shown:


-noinline=name[,name..]   [-ninl=] 
-noipa=name[,name..]      [-nipa=] 

These switches enable the automatic inlining (IPA) algorithms in the same way that inline (IPA) does when given without arguments, but the routines listed are ones to NOT be inlined (analyzed). That is, all the subroutines and functions but the named ones are eligible.

A list of function names is required.

You cannot specify both -inline and -noinline (-ipa and -noipa) on the same command line.

If all call sites of a subroutine or function are to be inlined, the following variant of the -inline switch may be of interest:


-inline_and_copy[=name[,name..]]      [-inlc=] 

The -inline_and_copy command-line switch functions like the -inline switch, except that if all references in the source file to a function are inlined, the text of the function is copied to the transformed code file unchanged. This is intended for use when the functions being inlined are in the same file as the function reference, and has no special effect when the routines being inlined are being taken from a library or another source file.

7.1.4 DO Loop Level

The switches, -inline_looplevel=<n> [-inll] and -ipa_looplevel=<n> [-ipall] , set a minimum DO loop nest level for CALL/function reference expansion. The -inline_looplevel and -ipa_looplevel switches enable you to limit inlining and IPA to just routines that are referenced in nested loops, where the reduced call overhead or enhanced optimization will be multiplied.

The argument is defined from the most deeply nested leaf of the call tree. The default, 10, allows inlining (IPA) for the 10 deepest nest levels, for example:


PROGRAM MAIN 
  .. 
    CALL A ---> SUBROUTINE A 
  .. 
    DO 
      DO 
        CALL B --> SUBROUTINE B 
        DO 
          DO 
            CALL C ---> SUBROUTINE C 
          ENDDO 
        ENDDO 
      ENDDO 
    ENDDO 

The CALL B is inside a doubly nested loop, and would be more profitable to expand than the CALL A . The CALL C is quadruply nested, so inlining C would yield the biggest gain of the three.

The argument is defined from the most deeply nested CALL or function reference:

7.1.5 Recursive Inlining

The -inline_depth switch ( -inline_depth=<n> [-ind] ) sets the maximum level of subprogram nesting (CALLs in routines that are CALLed) that KAP will attempt to inline. Higher values cause KAP to trace CALLs and function references further. The values and their meanings are as follows:

1--10 --- Inline routines to this depth.
0 --- Use the default value.
-1 --- Inline only routines that do not contain CALLs or function references.

Recursive inlining can be quite expensive in compilation time. You must exercise discretion in its use.

7.1.6 Manual Control

These switches ( -inline_manual [-inm] and -ipa_manual [-ipam] ) cause KAP to recognize the KAP !*$* [no]inline and !*$* [no]ipa directives. This allows manual control over which routines are inlined/analyzed at which call sites.

The default is to ignore these directives. They are enabled when any inlining/IPA command-line switch is given on the command line. When -inline_manual or -ipa_manual is included on the command line, the directives are enabled without enabling the automatic inlining (IPA) algorithms. Because !*$* [no]inline and !*$* [no]ipa are not restricted by the -inline , -ipa , -..._looplevel , and -inline_depth command-line switches, they can be used either with or without command-line controlled inlining.

7.2 Inlining and IPA Directives

This section describes the following KAP directives:


!*$* [no]inline  [here|routine|global]  [(name[,name..])] 
!*$* [no]ipa     [here|routine|global]  [(name[,name..])] 

The !*$* inline and !*$*ipa directives tell KAP to inline or IPA the named routines. The !*$* noinline and !*$* noipa directives tell KAP not to inline or analyze the named routines. These directives combine next-line, entire routine, and global (entire program) scope. If none of the optional elements are included, all routines referenced on the next line of code that are in the inlining/analyzing universe are inlined on that line.

These directives are disabled by default. They are enabled when any inlining or IPA switch, respectively, is given on the command line. They can be enabled without activating the automatic inlining/IPA selection algorithms with the -inline_manual and -ipa_manual command-line switches. They are not restricted by the other inlining and IPA command-line switches, and can be used instead of, or in addition to, command-line controlled inlining.

The optional names are routine names. If any routines are named in the directive, it applies only to them. If NO routine names are given, the directive applies to ALL routines. The parentheses around the routine names are not required if the list of routine names is empty.

If a !*$* inline or !*$* ipa names a routine not in the corresponding universe, a warning message is issued, and the directive is ignored.

7.3 Listing File Support

The following section describes the metric used for the optional calling tree.

7.3.1 -listoptions=c

The optional calling tree includes the loop nest depth level of each CALL/function reference. The metric used is the convention of the -inline_looplevel and -ipa_looplevel switches --- the farthest-out leaf is 1, and higher values trace back to the main program.

7.4 Inlining/IPA Examples

The following code examples demonstrate a few of the possibilities for using the features described in this chapter. Because KAP undergoes constant enhancement, the code that your version of KAP produces may not be identical to that of these examples. The temporary variable names, in particular, can change without significantly altering the transformed code.

Unless otherwise noted, the following examples were run with -optimize=0 -scalaropt=0 to show the inlining more clearly. If nonzero values are specified, the routines are first inlined or analyzed, and then the regular serial transformations are applied.

7.4.1 Inlining Example --- Same Source File

The following example demonstrates inlining both with -inline=setup , meaning only the subroutine setup will be inlined, and with -inline , meaning both subroutines are inlined. The KAP output includes optimized versions of both routines, in addition to the expanded main program. Setting -inline_looplevel>2 is required, because one CALL is in a loop and one is not.


Source file: 
   PROGRAM TSTEXP 
   REAL A(200,200),B(200,200),C(200,200) 
   CALL SETUP(B,200) 
   CALL SETUP(C,200) 
   DO 100  N = 25,200,25 
   CALL MXMR(N,A,B,C) 
   WRITE(*,900) N,A(7,13) 
100 CONTINUE 
900 FORMAT(1X,'For N=',I5,',  A(7,13):',F12.4) 
   END 
 
   SUBROUTINE SETUP(E,N) 
   REAL E(N,N) 
   DO 10 I=1,N 
      DO 10 J=1,N 
        E(I,J) = MOD( I + 7*J, 10) /10.0 
    10 CONTINUE 
       RETURN 
       END 
 
   SUBROUTINE MXMR(N,A,B,C) 
   REAL A(200,200),B(200,200),C(200,200) 
   DO 1000 I=1,N 
     DO 1000 J=1,N 
         A(I,J) = 0.0 
           DO 1000 K=1,N 
             A(I,J)=A(I,J)+B(I,K)*C(K,J) 
   1000 CONTINUE 
        RETURN 
        END 

The -inline=setup switch generates the following main program:


PROGRAM TSTEXP 
  REAL A(200,200),B(200,200),C(200,200) 
  INTEGER II1, II2, II3, II4 
  DO 2 II1=1,200 
  DO 2 II2=1,200 
  B(II1,II2) = MOD (II1 + 7 * II2, 10) / 10.0 
2 CONTINUE 
  DO 3 II3=1,200 
  DO 3 II4=1,200 
  C(II3,II4) = MOD (II3 + 7 * II4, 10) / 10.0 
3 CONTINUE 
 DO 100 N=25,200,25 
   CALL MXMR(N,A,B,C) 
   WRITE(*,900) N,A(7,13) 
100 CONTINUE 
900 FORMAT(1X,'For N=',I5,',  A(7,13):',F12.4) 
  END 

The -inline switch generates the following output:


PROGRAM TSTEXP 
  REAL A(200,200),B(200,200),C(200,200) 
  INTEGER II1, II2, II3, II4, II5, II6, II7 
  DO 3 II4=1,200 
  DO 3 II5=1,200 
  B(II4,II5) = MOD (II4 + 7 * II5, 10) / 10.0 
3 CONTINUE 
  DO 4 II6=1,200 
  DO 4 II7=1,200 
  C(II6,II7) = MOD (II6 + 7 * II7, 10) / 10.0 
4 CONTINUE 
  DO 100 N=25,200,25 
  DO 2 II1=1,N 
  DO 2 II2=1,N 
  A(II1,II2) = 0.0 
  DO 2 II3=1,N 
  A(II1,II2) = A(II1,II2) + B(II1,II3) * C(II3,II2) 
2 CONTINUE 
  WRITE (*, 900) N, A(7,13) 
100 CONTINUE 
900 FORMAT(1X,'For N=',I5,',  A(7,13):',F12.4) 
  END 

7.4.2 Inlining Example with a Library

The following example demonstrates the creation of a library and inlining routines from it; a two-step process:

Step 1: Create the library.


SUBROUTINE  MKCOEF (COEF,N) 
  REAL COEF(N) 
  DO 99 I = 1,N 
  COEF(I) = 1.0/I 
99   CONTINUE 
  RETURN 
  END 
 
  REAL FUNCTION YVAL (X, COEF, N) 
  REAL COEF(N), X, SUM 
  SUM = 0.0 
  DO 99 I=1,N 
  SUM = SUM + COEF(I) * SIN(I*X) 
99   CONTINUE 
  YVAL = SUM 
  RETURN 
  END 

If the file subfil.f contains the previous two routines, then executing the KAP command, KAP -inline_create subfil.f will create a library file subfil.klib with the two routines, and a listing file subfil.out that contains only a list of routines and whether or not each was saved in the library:


subroutine MKCOEF -- saved 
function YVAL -- saved 

Step 2: Inline the routines into a calling program.


  PROGRAM  LIBCR 
  PARAMETER (NC = 15) 
  PARAMETER (PI = 3.141593) 
  REAL COEF(NC), YVAL, Y(2000) 
 
  CALL MKCOEF (COEF, NC) 
  DO 900 I = 1,2000 
  Y(I) = YVAL( I*0.001*PI, COEF, NC) 
900  CONTINUE 
  J=1 
  DO 910 I=1,2000,10 
  PRINT *, (Y(J),J=I,I+9) 
910  CONTINUE 
END 

If the file sqwv.f contains the main program LIBCR , then running the command KAP -inline-infl=subfil.klib -inll=2 -o=0 -r=0 -so=0 sqwv.f will put the following into the file sqwv.cmp:


  PROGRAM LIBCR 
  PARAMETER (NC = 15) 
  PARAMETER (PI = 3.141593) 
  REAL COEF(NC), YVAL, Y(2000) 
  SAVE J 
  REAL RR1, RR2, RR3 
  INTEGER II1, II2 
  DO 3 II2=1,NC 
  COEF(II2) = 1.0 / II2 
3 CONTINUE 
 
  DO 900 I=1,2000 
  RR1 = I * 0.001 * PI 
  RR2 = 0.0 
  DO 2 II1=1,NC 
  RR2 = RR2 + COEF(II1) * SIN (II1 * RR1) 
  2 CONTINUE 
  RR3 = RR2 
  Y(I) = RR3 
900  CONTINUE 
 
  J=1 
  DO 910 I=1,2000,10 
    PRINT *, (Y(J),J=I,I+9) 
910  CONTINUE 
  END 

In the previous example, all other optimizations were turned off to show the expansion more clearly. If you specify nonzero values for the -optimize , -scalaropt , and -roundoff switches, KAP first inlines the routines, then performs the optimizations in the usual manner.

7.4.3 IPA Example

In the following example, the variable N always has the same value, so the same IF branch will always be taken. This information is hidden behind a subroutine call, however, so KAP normally will not try to perform dead-code elimination to simplify the block IF in the first routine. When the -ipa=setn command-line switch is specified, KAP will inspect the named subroutine for information on the relationship of its arguments and returned value and the surrounding code. Once the CALLed routine is examined, KAP global forward substitution and dead-code elimination transformations (see Chapter 8) can delete the unused code.

If a subroutine or function cannot be inlined, or if you do not want to inline it, it can often still be analyzed for its effects on the calling routine.

The following example was run with the default values for -optimize and -scalaropt :


  CALL SETN ( N ) 
  IF ( N.GT.10 ) THEN 
    X = 1. 
  ELSE 
    X = 2. 
  ENDIF 
  ... 
  SUBROUTINE SETN (N) 
  INTEGER N 
  N = 12 
  RETURN 
  END 

The example becomes the following:


  CALL SETN (N) 
   X = 1. 
  ... 

Just the CALL and the simplified IF block were shown.

7.4.4 Recursive Inlining Examples

The -inline_depth switch sets the maximum level of subprogram nesting that KAP will attempt to inline. Higher values cause KAP to trace CALLs and function references further.

Consider the following simplified example:


PROGRAM EXDDEM 
REAL  A,B,C,D,E,F,G 
 
CALL S1 (A,B) 
CALL S2 (C,D,E,F) 
CALL S3 (G) 
 
PRINT *,A,B,C,D,E,F,G 
END 
 
SUBROUTINE S1 (W,X) 
REAL W,X 
W=1.0 
CALL S4(X) 
RETURN 
END 
 
SUBROUTINE S2 (Q,R,S,T) 
REAL Q,R,S,T 
Q = 2.0 
CALL S1 (R,S) 
CALL S3 (T) 
RETURN 
END 
 
SUBROUTINE S3 (U) 
REAL U 
U = 137.0 
RETURN 
END 
 
SUBROUTINE S4 (V) 
REAL V 
V = 2.7 
RETURN 
END 

When run with -inline and -inline_depth=4 , all the subroutines are inlined, including calls to calls to calls, and the main program becomes:


PROGRAM EXDDEM 
REAL  A,B,C,D,E,F,G 
EXTERNAL S4 
 
A = 1.0 
B = 2.7 
C = 2.0 
D = 1.0 
E = 2.7 
F = 137.0 
G = 137.0 
 
PRINT *, A, B, C, D, E, F, G 
END 

When run with -inline and -inline_depth=1 , meaning inline only one routine deep, all the CALLs in the main program and subroutines are expanded, but CALLs in inlined routines are not. The main program becomes:


PROGRAM EXDDEM 
REAL  A,B,C,D,E,F,G 
EXTERNAL S4 
 
A = 1.0 
CALL S4 (B) 
C = 2.0 
CALL S1 (D,E) 
CALL S3 (F) 
G = 137.0 
 
PRINT *, A, B, C, D, E, F, G 
END 

When run with -inline and -inline_depth=-1 (inline only routines that do not contain CALLs or FUNCTION references), the main program becomes:


PROGRAM EXDDEM 
REAL  A,B,C,D,E,F,G 
 
CALL S1 (A,B) 
CALL S2 (C,D,E,F) 
G = 137.0 
 
PRINT *, A, B, C, D, E, F, G 
END 

In this last case, only SUBROUTINEs S3 and S4 could be inlined. Repeated runs with -inline_depth=1 can be used to inline additional levels of routines.

7.4.5 Manual Inlining Example

Manual inlining and IPA allow you greater control over the routines that are inlined/analyzed at CALL sites. They use directives ( !*$*inline and !*$*ipa ) that are placed into the source code. The directives are normally ignored by KAP, but are enabled when an inlining or IPA switch, respectively, is given on the command line.

The following example is based on the Recursive Inlining example from Section 7.4.4. It was run with default switches, except for -inline_manual .

The directives used have different scopes. Routine S1 is inlined everywhere it is used, routine S3 is inlined only in the main program, and routine S4 is inlined only in routine S1 , which is then inlined elsewhere.

With the default value for -scalaropt , forward substitution places the assigned values into the PRINT statement and dead-code elimination deletes the now-unneeded inlined assignments:


!*$* INLINE GLOBAL (S1) 
!*$* INLINE ROUTINE (S3) 
PROGRAM EXDTST 
REAL  A,B,C,D,E,F,G 
 
CALL S1 (A,B) 
CALL S2 (C,D,E,F) 
CALL S3 (G) 
 
PRINT *,A,B,C,D,E,F,G 
END 
 
SUBROUTINE S1 (W,X) 
REAL W,X 
W=1.0 
!*$* INLINE 
CALL S4(X) 
RETURN 
END 
 
SUBROUTINE S2 (Q,R,S,T) 
REAL Q,R,S,T 
Q = 2.0 
CALL S1 (R,S) 
CALL S3 (T) 
RETURN 
END 
 
SUBROUTINE S3 (U) 
REAL U 
U = 137.0 
RETURN 
END 
 
SUBROUTINE S4 (V) 
REAL V 
V = 2.7 
RETURN 
END 

Becomes:


KAP/Tru64_U_F90 4.4 k340504 20010517                 01-Sep-2001 09:31:22 
 
!*$* INLINE GLOBAL ( S1 ) 
!*$* INLINE ROUTINE ( S3 ) 
 
PROGRAM EXDTST 
REAL A, B, C, D, E, F, G 
SAVE G, F, E, D, C, B, A 
EXTERNAL S4 
CALL S2 (C,D,E,F) 
 
PRINT *, 1., 2.7, C, D, E, F, 137. 
END 
 
 
KAP/Tru64_U_F90 4.4 k340504 20010517                 01-Sep-2001 09:31:22 
 
 
SUBROUTINE S1 (W, X) 
REAL W, X 
W = 1. 
 
!*$* INLINE 
 
X = 2.7 
RETURN 
END 
 
 
KAP/Tru64_U_F90 4.4 k340504 20010517                 01-Sep-2001 09:31:22 
 
 
SUBROUTINE S2 (Q, R, S, T) 
REAL Q, R, S, T 
EXTERNAL S4 
Q = 2. 
R = 1. 
S = 2.7 
CALL S3 (T) 
RETURN 
END 

Subroutines S3 and S4 are unchanged.

7.4.6 Notes on Inlining and IPA

Routines to be inlined must pass all the criteria ( -inline=, -inline_looplevel, -inline_depth ) to be inlined.

The !*$* [no]inline and !*$* [no]ipa directives, when enabled, override the inlining/IPA command-line switches.

A !*$* inline global directive without a name list tells KAP to inline every routine it can, regardless of the -inline , -inline_depth , and -inline_looplevel settings. A !*$* noinline global directive tells KAP not to inline anything, regardless of the -inline, -inline_depth, and -inline_looplevel settings. The !*$* inline and !*$* ipa directives are disabled by default; they are enabled when any inlining or IPA command-line switch is specified.

When a library is specified with -inline_from_libraries , routines may be taken from that library for inlining into the source code. No attempt is made to inline routines from the source file into routines from the library. For example, if the main program calls routine BB, which is in the library, and BB calls routine DD, which is in the source file, then BB can be inlined into the main program, but KAP will not attempt to inline DD into the text from library routine BB.

A library created with -inline_create will work for inlining or IPA, because it is just partially reduced source code. However, a library created with -ipa_create may not appear in an -inline_from_libraries= list. Attempting to do so is flagged with a warning message.

Inlining and IPA are slow, memory-intensive activities. Specifying large values for -inline_looplevel and -inline_depth (inline all available routines everywhere they are used) for a large set of inlinable routines for a large source file can absorb significant system resources. For most programs, specifying small values for -inline_looplevel and -inline_depth and/or a small number of routines with -inline= can provide most of the benefits of inlining. The same applies for the IPA switches.

7.5 Conditions Inhibiting Inlining/IPA

This section lists conditions that inhibit the inlining of subroutines and functions, whether from a library or source file. Many constructs that prevent inlining will also stop or restrict IPA.

See Section 7.4.6 for other notes on the use of the -inline and -ipa command-line switches and directives.

Conditions that inhibit inlining:


Previous Next Contents Index