Preface |
Preface
|
Preface
|
Chapter 1 |
1
|
What Is KAP?
|
Chapter 2 |
2
|
How to Run KAP
|
2.1
|
General KAP Information
|
2.2
|
Installing Compaq KAP
|
2.3
|
Compiling a Program Using the kf90 Driver
|
2.3.1
|
Passing Default KAP Switch Settings to kf90
|
2.4
|
Passing KAP Switches to kf90
|
2.4.1
|
Passing Compaq Fortran Compiler Switches to kf90
|
2.4.2
|
Additional Information About Using the kf90 Driver
|
2.5
|
Compiling a Program Containing C Preprocessor Directives Using kf90
|
2.6
|
Optimized Programs
|
2.7
|
KAP Command-Line Switches Determined by Compiler Switches
|
2.8
|
Compiling a Program Using kapf90
|
2.9
|
Compiling a Program Containing C Preprocessor Directives Using kapf90
|
2.10
|
Using KAP Syntax
|
2.11
|
Using File Naming Conventions
|
2.12
|
Guidelines for Optimizing With KAP
|
2.12.1
|
Optimizing Small Programs with KAP
|
2.12.2
|
Optimizing Large Programs with KAP
|
2.12.3
|
General Optimization Tips
|
2.13
|
Improving and Customizing KAP Performance
|
2.14
|
Using Additional Performance Improvement Techniques
|
2.15
|
Correcting KAP Problems
|
Chapter 3 |
3
|
KAP Parallel Processing
|
3.1
|
Overview
|
3.2
|
Parallel Processing Methods
|
3.2.1
|
Automatic Detection Method
|
3.2.2
|
Directed Method
|
3.2.3
|
Combination Method
|
3.2.4
|
Environment Variables to Set When -psyntax=openmp Is Selected
|
3.3
|
Interaction of Parallel Processing Controls
|
3.4
|
Automatic Parallelization Using the kf90 Driver
|
3.4.1
|
Changing Source Programs
|
3.4.2
|
Giving Command-Line Switches
|
3.4.3
|
Directing the Compilation and Linking Process
|
3.5
|
Directed Parallelization Using the kf90 Driver and OpenMP Directives
|
3.5.1
|
Changing Source Programs
|
3.5.2
|
Giving Command-Line Switches
|
3.5.3
|
Directing the Compilation and Linking Process
|
3.6
|
Combined Automatic and Directed Parallelization Using the kf90 Driver
|
3.6.1
|
Changing Source Programs
|
3.6.2
|
Giving Command-Line Switches
|
3.6.3
|
Directing the Compilation and Linking Process
|
3.7
|
Compiling a Program for Parallel Execution Using kapf90
|
3.8
|
Running a Parallelized Program
|
3.9
|
Parallel Programming Tips
|
Chapter 4 |
4
|
KAP Command-Line Switches
|
4.1
|
Overview
|
4.2
|
Switches for the kf90 Driver
|
4.2.1
|
-fext, (-fext=f)
|
4.2.2
|
-f90, (-f90=/usr/bin/f90)
|
4.2.3
|
-f90kap, (-f90kap=/usr/bin/kapf90)
|
4.2.4
|
-fkapargs
|
4.2.5
|
-S, (off)
|
4.2.6
|
-tmpdir, (-tmpdir=/tmp/)
|
4.2.7
|
-tune, (-tune=<current system architecture>)
|
4.2.8
|
-verbose, -v, (-nov)
|
4.3
|
General Optimization Switches for kapf90
|
4.3.1
|
-interchange, -nointerchange, (-interchange)
|
4.3.2
|
-namepartitioning, -namepart, (-nonamepart)
|
4.3.3
|
-optimize, -o, (-optimize=5)
|
4.3.4
|
-recursion, -rec, -norec, (-norecursion)
|
4.3.5
|
-roundoff, -r, (-r=3)
|
4.3.6
|
-scalaropt, -so, (-scalaropt=3)
|
4.3.7
|
-skip, -sk, -nsk, (-noskip)
|
4.3.8
|
-tune, (-tune=<current system architecture>)
|
4.4
|
Parallel Processing Switches for kapf90
|
4.4.1
|
-chunk, (-chunk=1)
|
4.4.2
|
-concurrent, -conc, -noconc, (-noconcurrent)
|
4.4.3
|
-minconcurrent, -mc, (-mc=1000)
|
4.4.4
|
-parallelio, -nopio, -pio, (-noparallelio)
|
4.4.5
|
-pdefault, (-pdefault=safe)
|
4.4.6
|
-psyntax, (-psyntax=openmp)
|
4.4.7
|
-scheduling, -sched, (-sched=e)
|
4.5
|
Fortran Dialect Switches for kapf90
|
4.5.1
|
-align_common, (-align_common=8)
|
4.5.2
|
-align_struct, (-align_struct=4)
|
4.5.3
|
-assume, -a, (-assume=cel), -noassume, -na
|
4.5.4
|
-datasave, -ds, -nodatasave, -nds, (-datasave)
|
4.5.5
|
-dlines, -dl, -ndl, (-nodlines)
|
4.5.6
|
-escape, (-noescape)
|
4.5.7
|
-freeformat, -ff, -nff, (-nofreeformat)
|
4.5.8
|
-integer, -int, (-int=4)
|
4.5.9
|
-intlog, (-intlog)
|
4.5.10
|
-kind, (-kind=4)
|
4.5.11
|
-logical, -log, (-log=4)
|
4.5.12
|
-onetrip, -1, (-n1), -noonetrip
|
4.5.13
|
-real, -rl, (-rl=4)
|
4.5.14
|
-save, -sv, (-sv=manual_adjust)
|
4.5.15
|
-scan, (-scan=72)
|
4.5.16
|
-syntax, -sy, (off)
|
4.5.17
|
-type, -ty, -nty, (-notype)
|
4.6
|
Inlining and Interprocedural Analysis Switches for kapf90
|
4.6.1
|
-inline, -inl, -noinline, -ninl, (off) -ipa, -ipa, -noipa, -nipa, (off)
|
4.6.2
|
-inline_and_copy, -inlc, (off)
|
4.6.3
|
-inline_create, -incr, (off), -ipa_create, -ipacr, (off)
|
4.6.4
|
-inline_depth, -ind, (-ind=2), -ipa_depth, -ipad, (-ipad=2)
|
4.6.5
|
-inline_from_files, -inff, (current source file)
|
4.6.6
|
-inline_from_libraries, -infl, (off)
|
4.6.7
|
-ipa_from_files, -ipaff, (current source file)
|
4.6.8
|
-ipa_from_libraries, -ipafl, (off)
|
4.6.9
|
-inline_looplevel, -inll, (-inll=2), -ipa_looplevel, -ipall, (-ipall=2)
|
4.6.10
|
-inline_manual, -inm, (off), -ipa_manual, -ipam, (off)
|
4.6.11
|
-inline_optimize, (-inline_optimize=0), -ipa_optimize, (-ipa_optimize=0)
|
4.7
|
Advanced Optimization Switches for kapf90
|
4.7.1
|
-aggressive, -ag, -nag, (-noaggressive)
|
4.7.2
|
-arclimit, -arclm, (-arclimit=5000)
|
4.7.3
|
-cacheline, -chl, (-chl=64,64)
|
4.7.4
|
-cache_prefetch_line_count, -cplc, (-cplc=0)
|
4.7.5
|
-cachesize, -chs, (-chs=32,0)
|
4.7.6
|
-dpregisters, -dpr, (-dpr=32)
|
4.7.7
|
-each_invariant_if_growth, -eiifg, (-eiifg=20)
|
4.7.8
|
-fpregisters, -fpr, (-fpr=32)
|
4.7.9
|
-fuse, -nfuse, (-nofuse)
|
4.7.10
|
-fuselevel, (-fuselevel=0)
|
4.7.11
|
-generateh, -genh
|
4.7.12
|
-hdir, -hd, (-hdir=<current directory>)
|
4.7.13
|
-heaplimit, -heap, (-heaplimit=100)
|
4.7.14
|
-hoist_loop_invariants, -hli, (-hli=1)
|
4.7.15
|
-interleave, -intl, (-interleave)
|
4.7.16
|
-library_calls, -lc, (off)
|
4.7.17
|
-limit, -lm, (-lm=10)
|
4.7.18
|
-machine, -ma, -nomachine, -noma, (-ma=s)
|
4.7.19
|
-max_invariant_if_growth, -miifg, (-miifg=500)
|
4.7.20
|
-routine, -rt, (off)
|
4.7.21
|
-setassociativity, -sasc, (-sasc=1,1)
|
4.7.22
|
-srlcd, -nsrlcd, (-nosrlcd)
|
4.7.23
|
-tablesize, -ts, (-ts=24000000)
|
4.7.24
|
-unroll, -ur, (-ur=4), -unroll2, -ur2, (-ur2=160), -unroll3, -ur3, (-ur3=1)
|
4.7.25
|
-useh
|
4.8
|
Directive Recognition Switches for kapf90
|
4.8.1
|
-directives, -dr, -nodirectives, -ndr, (-directives=akpv)
|
4.8.2
|
-ignoreoptions, -ig, -nig, (-noignoreoptions)
|
4.9
|
Input-Output Switches for kapf90
|
4.9.1
|
-cmp, (<file>.cmp.f90), (<file>.cmp.f), -nocmp, -ncmp
|
4.9.2
|
-include, -inc, (off)
|
4.9.3
|
-list, -l, -nl, (-list=<file>.out)
|
4.10
|
Listing Switches for kapf90
|
4.10.1
|
-cmpoptions, -cp, -ncp, (-nocmpoptions)
|
4.10.2
|
-lines, -ln, (-ln=55)
|
4.10.3
|
-listingwidth, -lw, (-lw=132)
|
4.10.4
|
-listoptions, -lo, (-lo=klo)
|
4.10.5
|
-suppress, -su, (off)
|
4.11
|
!*$*options
|
Chapter 5 |
5
|
KAP Directives
|
5.1
|
Overview
|
5.2
|
Usage and Syntax of Directives
|
5.3
|
General Optimization Directives
|
5.3.1
|
!*$* arclimit (0--5000)
|
5.3.2
|
!*$* beginblock <directive block> !*$* endblock
|
5.3.3
|
!*$* each_invariant_if_growth (0--5000)
|
5.3.4
|
!*$* limit (=> 0)
|
5.3.5
|
!*$* max_invariant_if_growth (0--50000)
|
5.3.6
|
!*$* optimize (0--5)
|
5.3.7
|
!*$* roundoff (0--3)
|
5.3.8
|
!*$* scalar optimize (0--3)
|
5.3.9
|
!*$* unroll(<#it>[,<integer>])
|
5.4
|
Parallel Processing Directives for Automatic Parallelization
|
5.4.1
|
!*$* [no]concurrentize
|
5.4.2
|
!*$* minconcurrent (0--999999)
|
5.5
|
Inlining and IPA
|
5.5.1
|
!*$* [no]inline [here|routine|global] [(name [,name...])]
|
5.5.2
|
!*$* [no]ipa [here|routine|global] [(name [,name...])]
|
5.6
|
Assertions Directive
|
5.6.1
|
!*$* [no]assertions
|
5.7
|
Memory Management Directives
|
5.7.1
|
!*$* padding (var-list)
|
5.7.2
|
!*$* storage order (var-list)
|
Chapter 6 |
6
|
KAP Assertions
|
6.1
|
Overview
|
6.2
|
Descriptions of KAP Assertions
|
6.2.1
|
!*$* assert [no]argument aliasing
|
6.2.2
|
!*$* assert [no]bounds violations
|
6.2.3
|
!*$* assert [no]equivalence hazard
|
6.2.4
|
!*$* assert [no]last value needed
|
6.2.5
|
!*$* assert permutation
|
6.2.6
|
!*$* assert no recurrence
|
6.2.7
|
!*$* assert relation ( <name> .XX. <variable/constant>)
|
6.2.8
|
!*$* assert no sync
|
6.2.9
|
!*$* assert [no] temporaries for constant arguments
|
6.3
|
Parallel Processing Assertions that Guide Automatic Parallelization
|
6.3.1
|
!*$* assert concurrent call
|
6.3.2
|
!*$* assert do (concurrent)
|
6.3.3
|
!*$* assert do (concurrent call)
|
6.3.4
|
!*$* assert do (serial)
|
6.3.5
|
!*$* assert do prefer (concurrent)
|
6.3.6
|
!*$* assert do prefer (serial)
|
Chapter 7 |
7
|
Inlining and IPA
|
7.1
|
Inlining and IPA Command-Line Switches
|
7.1.1
|
inline_from/ipa_from Switches
|
7.1.2
|
Library Creation
|
7.1.3
|
Naming Specific Routines
|
7.1.4
|
DO Loop Level
|
7.1.5
|
Recursive Inlining
|
7.1.6
|
Manual Control
|
7.2
|
Inlining and IPA Directives
|
7.3
|
Listing File Support
|
7.3.1
|
-listoptions=c
|
7.4
|
Inlining/IPA Examples
|
7.4.1
|
Inlining Example --- Same Source File
|
7.4.2
|
Inlining Example with a Library
|
7.4.3
|
IPA Example
|
7.4.4
|
Recursive Inlining Examples
|
7.4.5
|
Manual Inlining Example
|
7.4.6
|
Notes on Inlining and IPA
|
7.5
|
Conditions Inhibiting Inlining/IPA
|
Chapter 8 |
8
|
Transformations
|
8.1
|
Memory Management
|
8.1.1
|
Command-Line Switches
|
8.1.2
|
Memory Management Tactics
|
8.2
|
Serial Optimizations
|
8.2.1
|
Dead-Code Elimination
|
8.2.2
|
Induction Variable Recognition
|
8.2.3
|
Global Forward Substitution
|
8.2.4
|
Loop Peeling
|
8.2.5
|
Lifetime Analysis
|
8.2.6
|
Invariant-IF Restructuring
|
8.2.7
|
Reciprocal Substitution
|
8.3
|
Scalar (Dusty-Deck) IF Transformations
|
8.3.1
|
IF to Block IF
|
8.3.2
|
IF to DO Loop
|
8.3.3
|
Semantic IF Merging
|
8.3.4
|
Zero-Trip IF Removal
|
8.4
|
Loop Unrolling
|
8.5
|
Loop Rerolling
|
Chapter 9 |
9
|
KAP Listing File
|
9.1
|
Listing Switches
|
9.1.1
|
Original Program Listing (O)
|
9.1.2
|
Calling Tree (C)
|
9.1.3
|
KAP Switches (K)
|
9.1.4
|
Loop Table (L)
|
9.1.5
|
Name (N)
|
9.1.6
|
Compilation Performance Statistics (P)
|
9.1.7
|
Summary Table (S)
|
9.1.8
|
Transformed Program Listing (T)
|
9.2
|
Listing Information
|
9.2.1
|
Line Numbers
|
9.2.2
|
DO Loop Markings
|
9.2.3
|
INCLUDE File Markings
|
9.2.4
|
Footnotes
|
9.2.5
|
Syntax Error/Warning Messages
|
9.2.6
|
Questions Generated by KAP
|
9.2.7
|
Action Summary
|
9.3
|
Loop Table Messages
|
9.4
|
KAP Listing Messages
|
Appendix A |
Appendix A
|
Compaq Fortran Extensions Supported by KAP Fortran/OpenMP
|
Appendix B |
Appendix B
|
Data Dependence Analysis
|
B.1
|
Data Dependence Definitions
|
B.2
|
Varieties of Data Dependence
|
B.3
|
Input and Output Sets
|
B.4
|
Data Dependence Relations
|
B.5
|
Data Dependence Direction Vectors
|
B.6
|
Loop-Carried Dependence
|
B.7
|
Data Dependence Examples
|
Appendix C |
Appendix C
|
OpenMP Examples
|
C.1
|
DO: A Simple Difference Operator
|
C.2
|
DO: Two Difference Operators
|
C.3
|
DO: Reduce Fork/Join Overhead
|
C.4
|
SECTIONS: Two Difference Operators
|
C.5
|
SINGLE: Updating a Shared Scalar
|
C.6
|
SECTIONS: Updating a Shared Scalar
|
C.7
|
DO: Updating a Shared Scalar
|
C.8
|
PARALLEL DO: A Simple Difference Operator
|
C.9
|
PARALLEL SECTIONS: Two Difference Operators
|
C.10
|
Simple Reduction
|
C.11
|
TASKCOMMON: Private Common
|
C.12
|
THREADPRIVATE: Private Common and Master Thread
|
C.13
|
INSTANCE PARALLEL: As a Private Common
|
C.14
|
INSTANCE PARALLEL: As a Shared and then a Private Common
|
C.15
|
Avoiding External Routines: Reduction
|
C.16
|
Avoiding External Routines: Temporary Storage
|
C.17
|
FIRSTPRIVATE: Copying in Initialization Values
|
C.18
|
THREADPRIVATE: Copying in Initialization Values
|
C.19
|
INSTANCE PARALLEL: Copying in Initialization Values
|
Appendix D |
Appendix D
|
PCF Directives
|
D.1
|
PARALLEL REGION Directive
|
D.2
|
PARALLEL DO Directive
|
D.3
|
DO Loop Example with PCF Directives
|
D.4
|
Program Example with PCF Directives
|
D.5
|
CRITICAL SECTION Directive
|
D.6
|
ONE PROCESSOR SECTION Directive
|
D.7
|
Comparison of KAP PCF and Cray Autotasking Directives
|
Appendix E |
Appendix E
|
KAP and Incorrect Programs
|
Appendix F |
Appendix F
|
Listing File Messages
|
F.1
|
Classes of Messages
|
F.2
|
Messages
|
F.2.1
|
Data Dependence (DD)
|
F.2.2
|
Error (E)
|
F.2.3
|
Extension (EX)
|
F.2.4
|
Inlining/IPA (INL)
|
F.2.5
|
Informational (INF)
|
F.2.6
|
Inserted (I)
|
F.2.7
|
Loop Reordering (LR)
|
F.2.8
|
Warning (MIS)
|
F.2.9
|
Option Error (OW)
|
F.2.10
|
Not Optimized (NO)
|
F.2.11
|
Output Translation (OT)
|
F.2.12
|
Output Trans Fails (OTF)
|
F.2.13
|
Program Too Large (NO)
|
F.2.14
|
Question (Q)
|
F.2.15
|
Scalar Optimization (SO)
|
F.2.16
|
Standardized (STD)
|
F.2.17
|
Translator Error (TE)
|
F.2.18
|
Vector Enhanced (VE)
|
F.2.19
|
Warning (W)
|
Index |
Index |
Tables |
2-1 |
kf90 Assumed Source Format Based on Switch Settings and File Extensions |
2-2 |
User Actions for Specific Goals |
3-1 |
OpenMP Directives Correlated to Cray CMIC Parallel Directives |
4-1 |
Command-Line Switches for the kf90 Driver |
4-2 |
Command-Line Switches for the kapf90 Translator |
5-1 |
KAP Directives |
6-1 |
KAP Assertions |
A-1 |
Compaq Fortran Extensions Supported by KAP |
D-1 |
KAP PCF and Cray Autotasking Directives |