IBM Flag disclosure XLC/XLF options: -O3 - optimization level 3 turned on -Q - Turn inlining on -Q=xxx - Inline functions < xxx lines -qarch=ppc - sets architecture to PowerPC -qarch=power2 - sets architecture to Power2 -qarch=pwrx - sets architecture to Power2 -qtune=pwr2 - instruction selection, scheduling, and other implementation dependent performance enhancements to Power2 -qpdf1/pdf2 - profile directed feedback optimization -qmaxmem=-1 - No limit to how much memory to use during compilation -qhsflt - prevents rounding of single-precision expressions and replacing -qfloat=hsflt floating-point division by multiplication by the reciprocal of the divisor -qrndsngl - rounds the result of each single-precision operation to single- precision, rather than waiting until the full expression is evaluated -qstrict - ensures that optimzations of -O3 do not alter the semantics of the program -qalias=noaryovrlp - Program does not contain array assignments of overlapping or storage-associated arrays; can produce significant performance improvements for array language. -qhot - performs high order loop transformations -qhot=arraypad=248 - Performs additional loop optimization and pads array dimensions to prevent cache misses. -qipa=noobject:level=2:partition=large - Specifies the size of program sections that are analyzed together. Larger partitions produce better analysis but require more storage. -qintlog - allows for mixing integer and logical data entities in expressions and statements -qipa - turns on interprocedural analysis -tbtable=none - Don't generate traceback information -qnosave - sets default storage class of local variables to automatic -qdatalocal - assume all data items are local -1 - Executes DO loops at least once, if reached. -qdpc - increase the precision of real constants, for maximum accuracy when assigning real constants to DOUBLE PRECISION variables. -qlog4 - Logical expressions that have a LOGICAL result are of type LOGICAL(4). -qassert=addr - Variables are disjoint from pointers unless their address is taken. -qassert=allp - Pointers are never aliased. -qcompact - Reduce code size where possible, at the expense of execution speed. Code size is reduced by inhibiting optimizations that replicate or expand code inline. -qansialias - Use type-based aliasing during optimization. -qinlglue - Generate fast external linkage by inlining the code (pointer glue code) necessary at calls via a function pointer and calls to external procedures. -qunroll - Allow the optimizer to unroll loops. Linker Options: -lmass - link mathematical acceleration subsystem library -bnso - Statically bind executables -bI:/lib/syscalls.exp -lhmu - link fast malloc libraries -lhm -lhu /usr/ccs/lib/bmalloc.o: A high performance implementation of the Berkley malloc package. KAP Preprocessor Options: -Pk -Wp - turns on the Kap pre-processor -ag=a - pads common blocks and memory local to the subroutine to avoid cache line collisions. -ag=b - kapf can adjust the leading dimensions of arrays in COMMON away from a power of 2 if the arrays are not used as actual arguments to any user procedure calls. -r=2 - sets roundoff level to 2 -ur2=xxx - sets a maximum weight (estimate of work) for each unrolled iteration. (Work is estimated by counting operands and operators in a loop.) -inl - inline -ur=xxx - maximum number of iterations of a loop to unroll -lm=5 - Limit amount of loop nesting. -fuse - The fuse command line option enables loop fusion, a conventional compiler optimization that transforms two adjacent loops into a single loop. -f - Leave pre-processed source file around Vast Preprocessor Options: -Pv -Wp - turns on the Vast Pre-processor -me - informs the preprocessor to enable alignment, inter-array padding and array redimensioning. -o - Leave pre-processed source file around -ew - is the same as -ea478 -ea2478 - (-ea allows alassociative trnsformations.) (-e2 specifies that no data dependencies exist in loop containing pointer-based variables.) (-e4 generates calls to optimized BLAS library routines.) (-e7 automatically expands called routines inline.) (-e8 searches input file first for expandable routines.) fdpr is a utility to optimize existing binaries. For fdpr (Feedback Directed Program Restructuring): -R2 - Specifies the level of optimization -R3 --- Christopher Chan-Nui | I intend to live forever - channui@austin.ibm.com | so far, so good #include |