Note: The GNU Compiler Collection provides a wide array of compiler options, described in detail and readily available at https://gcc.gnu.org/onlinedocs/gcc/Option-Index.html#Option-Index and https://gcc.gnu.org/onlinedocs/gfortran/. This SPEC CPU flags file contains excerpts from and brief summaries of portions of that documentation.
SPEC's modifications are:
Copyright 2006-2026 Standard Performance Evaluation Corporation
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being "Funding Free Software", the Front-Cover Texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in your SPEC CPU kit at $SPEC/Docs/licenses/FDL.v1.3.txt and on the web at https://www.spec.org/cpu2026/Docs/licenses/FDL.v1.3.txt. A copy of "Funding Free Software" is on your SPEC CPU kit at $SPEC/Docs/licenses/FundingFreeSW.txt and on the web at https://www.spec.org/cpu2026/Docs/licenses/FundingFreeSW.txt.
(a) The FSF's Front-Cover Text is:
A GNU Manual
(b) The FSF's Back-Cover Text is:
You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development.
Selecting one of the following will take you directly to that section:
Do not rely on language constraints to derive bounds for the number of iterations of a loop.
Warn, rather than generating a fatal error, when calls to external procedures have mismatches vs. the procedure definition. See also https://gcc.gnu.org/gcc-10/porting_to.html.
This flag is implied by -std=legacy.
Allow the compiler to perform optimizations that may introduce new data races on stores, without proving that the variable cannot be concurrently accessed by other threads. Does not affect optimization of local data. It is safe to use this option if it is known that global data will not be accessed by multiple threads.
Examples of optimizations enabled by -fallow-store-data-races include hoisting or if-conversions that may cause a value that was already in memory to be re-written with that same value. Such re-writing is safe in a single threaded context but may be unsafe in a multi-threaded context. On some processors, enabling if-conversions is required for vectorization.
Use -fallow-store-data-races to enable or -fno-allow-store-data-races to disable. The default is -fno-allow-store-data-races unless -Ofast is used, which implies -fallow-store-data-races.
Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations.
Assume that a loop with an exit will eventually take the exit and not loop indefinitely. This allows the compiler to remove loops that otherwise have no side-effects, not considering eventual endless looping as such.
Tells GCC to use the GNU semantics for "inline" functions, that is, the behavior prior to the C99 standard.
-finline-functions-called-once, which is implied by -O1, considers all "static" functions called once for inlining into their caller even if they are not marked "inline". If a call to a given function is integrated, then the function is not output as assembler code in its own right.
-fno-inline-functions-called-once inhibits this optimization.
Perform loop interchange.
This flag can improve cache performance on loop nests and allow further loop optimizations to take place, such as vectorization.
-floop-interchange is enabled by -O3.
-floop-unroll-and-jam applies unroll and jam transformations on feasible loops. In a loop nest this unrolls the outer loop by some factor and fuses the resulting multiple inner loops.
This flag is enabled by default at -O3. It is also enabled by -fprofile-use and -fauto-profile.
-fno-loop-unroll-and-jam disables this optimization.
Enables link time optimization. When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify the partitioning algorithm used by the link-time optimizer. The value is either 1to1 to specify a partitioning mirroring the original source files or balanced to specify partitioning into equally sized chunks (whenever possible) or max to create new partition for every symbol where possible. Specifying none as an algorithm disables partitioning and streaming completely. The default value is balanced. While 1to1 can be used as a workaround for various code ordering issues, the max partitioning is intended for internal testing only. The value one specifies that exactly one partition should be used while the value none bypasses partitioning and executes the link-time optimization step directly from the WPA phase.
Do not set "errno" after calling math functions that are executed with a single instruction, e.g., "sqrt".
Omit the frame pointer in functions that don't need one. This avoids the instructions to save, set up and restore the frame pointer; on many targets it also makes an extra register available.
-fomit-frame-pointer is the default at -O1 and higher.
Enable handling of OpenMP directives and generate parallel code.
Enables prefetching of arrays used in loops.
Instruments code to collect information for profile-driven feedback. Information is collected regarding both code paths and data values.
Update profile information using atomic operations (if supported by the platform). This option is useful when compiling multi-threaded programs: it prevents profile corruption by emitting thread-safe code.
With -fprofile-use all portions of programs not executed during the training run are aggressively optimized for size rather than speed. In some cases it is not practical to train all possible hot paths in the program. (For example, the program may contain functions specific to some given hardware and training may not cover all hardware configurations the program is run on.) With -fprofile-partial-training profile feedback will be ignored for all functions not executed during the training run leading them to be optimized as if they were compiled without profile feedback. This leads to better performance when the training run is not representative but also leads to significantly larger generated code.
Applies information from a profile run in order to improve optimization. Several optimizations are improved when profile data is available, including branch probabilities, loop peeling, and loop unrolling.
Disable optimizations for floating-point arithmetic that ignore the signedness of zero.
Enabled: Put all local arrays, even those of unknown size onto stack memory.
The -fno- form disables the behavior.
The language standards set aliasing requirements: programmers are expected to follow conventions so that the compiler can keep track of memory. If a program violates the requirements (for example, using pointer arithmetic), programs may crash, or (worse) wrong answers may be silently produced.
Unfortunately, the aliasing requirements from the standards are not always well understood.
Sometimes, the aliasing requirements are understood and nevertheless intentionally violated.
The -fno-strict-aliasing switch instructs the optimizer that it must not assume that the aliasing requirements from the standard are met by the current program. Note that this is an optimization switch, not a portability switch.
Turn off the group of GCC optimizations invoked via -ftree-vectorize and related flags, as described at https://gcc.gnu.org/projects/tree-ssa/vectorization.html.
You can turn off loop vectorization with -fno-tree-loop-vectorize. Note that this is an optimization switch, not a portability switch. If it is needed, then in base you must use it consistently. See: https://www.spec.org/cpu2026/Docs/runrules.html#BaseFlags and https://www.spec.org/cpu2026/Docs/runrules.html#MustValidate.
Attempts to decompose loops in order to run them on multiple processors.
Allow certain common subexpressions to be uncombined, which may sometimes benefit other optimizations.
Tells the optimizer to unroll all loops.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Assume that the current compilation unit represents the whole program being compiled. All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers.
Produce debugging information.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Link with libjemalloc, a fast, arena-based memory allocator.
Compiles for a 32-bit (LP32) data model.
Compiles for a 64-bit (LP64) data model.
Generate code for ilp32 (int, long, pointer 32-bit) or lp64 (int 32-bit, longs and pointers 64-bit). With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
On x86 systems, allows use of instructions that require the listed architecture.
On Arm systems, specifies the name of the target architecture and, optionally, one or more feature modifiers. This option has the form -march=arch{+[no]feature}
Generate code for processors that include the AVX extensions.
On aarch64 systems, -mcpu sets the kind of instructions that can be used (as if by -march) and how to tune for performance (as if by -mtune).
On x86 systems, -mcpu is a deprecated synonym for -mtune.
Generate code to take advantage of fused multiply-add
-mrecip
This option enables use of "RCPSS" and "RSQRTSS" instructions (and
their vectorized variants "RCPPS" and "RSQRTPS") with an additional
Newton-Raphson step to increase precision instead of "DIVSS" and
"SQRTSS" (and their vectorized variants) for single-precision
floating-point arguments. These instructions are generated only when
-funsafe-math-optimizations is enabled together with
-ffinite-math-only and -fno-trapping-math.
-mrecip=opt
This option controls which reciprocal estimate instructions may be
used. opt is a comma-separated list of options, which may be
preceded by a ! to invert the option:
all
Enable all estimate instructions.
default
Enable the default instructions, equivalent to -mrecip.
none
Disable all estimate instructions, equivalent to -mno-recip.
div Enable the approximation for scalar division.
vec-div
Enable the approximation for vectorized division.
sqrt
Enable the approximation for scalar square root.
vec-sqrt
Enable the approximation for vectorized square root.
So, for example, -mrecip=all,!sqrt enables all of the reciprocal
approximations, except for square root.
Allows use of instructions that require the SIMD units of the indicated type.
Tunes code based on the timing characteristics of the listed processor.
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races (as of GCC 10), and the Fortran-specific -fstack-arrays unless -fmax-stack-var-size is specified, and -fno-protect-parens.
Increases optimization levels: the higher the number, the more optimization is done. Higher levels of optimization may
require additional compilation time, in the hopes of reducing execution time. At -O, basic optimizations are performed,
such as constant merging and elimination of dead code. At -O2, additional optimizations are added, such as common
subexpression elimination and strict aliasing. At -O3, even more optimizations are performed, such as function inlining and
vectorization.
Many more details are available.
Same as -O1
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
Specifies maximal overall growth of the compilation unit caused by interprocedural constant propagation. For example, parameter value 10 limits unit growth to 1.1 times the original size.
Maximum depth of recursive cloning for self-recursive function.
IPA-CP calculates its own score of cloning profitability heuristics and performs those cloning opportunities with scores that exceed ipa-cp-eval-threshold.
The maximum number of instructions biased by probabilities of their execution that a loop may have to be unrolled. If a loop is unrolled, this parameter also determines how many times the loop code is unrolled.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
The maximum number of instructions that a loop may have to be unrolled. If a loop is unrolled, this parameter also determines how many times the loop code is unrolled.
-fPIE causes generation of position-independent code suitable for use in a shared library.
-fno-PIE, which is the default, disables the generation of position-independent code.
-pie produces a dynamically linked position independent executable. SPEC CPU Makefiles are not set up to support use of this option.
-no-pie, which is the default, does not generate dynamically linked position independent executables.
Enable support for POSIX Threads. Note that C++ programs using std::thread may require this flag.
On systems that support dynamic linking, this overrides -pie and prevents linking with the shared libraries. On other systems, this option has no effect.
Link the C++ library statically.
Sets the language standard to the specified version, for example c18, c++17, f2008.
Treat the specified symbol as undefined, to force linking of library modules to define it. You can use -u multiple times with different symbols to force loading of additional library modules. E.g., "-u malloc -ljemalloc" can guarantee linking the malloc() defined in the jemalloc library. Without it, the linking options order change may cause a failure to link jemalloc.
Allow multiple definitions of the same symbols instead of failing with an error. The symbols in jemalloc may conflict with symbols in glibc when using static linking, and this flag will cause the linker to continue instead of failing.
Remove unused functions from the generated executable.
Note that this is an optimization switch, not a portability switch. If it is needed, then in base you must use it consistently. See: https://www.spec.org/cpu2026/Docs/runrules.html#BaseFlags and https://www.spec.org/cpu2026/Docs/runrules.html#MustValidate.
Add the specified directory to the runtime library search path used when linking an ELF executable with shared objects.
Add the linker flag that requests a large stack. This flag is likely to be important only to one or two of the floating point speed benchmarks. In accordance with the rules for Base, it is set for all of fpspeed in base. See: https://www.spec.org/cpu2026/Docs/runrules.html#BaseFlags.
Set the requested page size for the program to one of the available sizes for your system - for example 2M, 4M, 1G.
Allows links to proceed even if there are multiple definitions of some symbols.
Ensure that there are no surprises if the benchmarks are run in an environment where file system metadata uses 64 bits.
Use big-endian representation for unformatted files. This is important when reading data files that were originally generated in big-endian format.
Disables a range of optimizations that provide faster, though sometimes less precise, mathematical operations.
A SPEC CPU config file might use this flag in combination with -Ofast, to specify that all the optimizations of -Ofast are desired, with the exception of -ffast-math.
You may need to use this flag in order to get certain benchmarks to validate. If it is needed, the normal rules about portability flags apply.
Do not allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.
A SPEC CPU config file might use this flag in combination with -Ofast, to specify that all the optimizations of -Ofast are desired, with the exception of -ffinite-math-only.
You may need to use this flag in order to get certain benchmarks to validate. If it is needed, the normal rules about portability flags apply.
Let the type "char" be signed, like "signed char".
Do not transform names of entities specified in the Fortran source file by appending underscores to them.
The switch -funsafe-math-optimizations allows the compiler to make certain(*) aggressive assumptions, such as disregarding the programmer's intended order of operations. The run rules allow such re-ordering https://www.spec.org/cpu2026/Docs/runrules.html#reordering. The rules also point out that you must get answers that pass SPEC's validation requirements. In some cases, that will mean that some optimizations must be turned off.
-fno-unsafe-math-optimizations turns off these(*) optimizations. You may need to use this flag in order to get certain benchmarks to validate. If it is needed, the normal rules about portability flags apply.
Let the type "char" be unsigned, like "unsigned char".
Invokes the GNU C compiler.
Invokes the GNU Fortran compiler.
Invokes the GNU C++ compiler.
This option causes all intrinsic procedures (including the GNU-specific extensions) to be accepted.
Place uninitialized global variables in a common block. This allows the linker to resolve all tentative definitions of the same variable in different compilation units to the same object. See also https://gcc.gnu.org/gcc-10/porting_to.html.
Allows source code in traditional (fixed-column) Fortran layout.
Downgrade some diagnostics about nonconformant code from errors to warnings, which may allow some non-conforming code to compile. Note that SPEC would be very unlikely to grant use of this option as a PORTABILITY flag; instead, SPEC would expect it to be applied to all benchmarks of a given language in base.
Please see https://www.spec.org/cpu2026/Docs/runrules.html#portability and https://www.spec.org/cpu2026/Docs/runrules.html#BaseFlags.
On some systems, this flag is required to link with std::basic_istream and related functions.
Use pipes rather than temporary files for communication between the various stages of compilation.
Enables warnings.
Inhibit all warning messages.
Write a linker map to the named file.
This option controls warnings when a declaration does not specify a type. This warning is enabled by default, as an error, in C99 and later dialects of C, and also by -Wall.
Do not warn about functions defined with a return type that defaults to "int" or which return something other than what they were declared to.
SPECrate runs might use one of these methods to bind processes to specific processors, depending on the config file.
Linux systems: the numactl command is commonly used. Here is a brief guide to understanding the specific command which will be found in the config file:
macOS systems: processes are not bound.
No special commands are needed for feedback-directed optimization, other than the compiler profile flags.
One or more of the following may have been used in the run. If so, it will be listed in the notes sections. Here is a brief guide to understanding them:
LD_LIBRARY_PATH=<directories> (set via config file preENV)
LD_LIBRARY_PATH controls the search order for libraries. Often, it can be defaulted. Sometimes, it is
explicitly set (as documented in the notes in the submission) in order to ensure that the correct versions of
libraries are picked up.
OMP_STACKSIZE=N (set via config file preENV)
Set the stack size for subordinate OpenMP threads.
ulimit -s N
ulimit -s unlimited
'ulimit' is a Unix command, entered prior to the run. It sets the stack size for the main process and its children,
either to N kbytes or to no limit.
MALLOC_CONF=thp:always,metadata_thp:always (set via config file preENV)
MALLOC_CONF controls jemalloc behavior.
The "thp" option controls whether heap allocations jemalloc does use transparent huge pages, if THP is supported by the operating system. The "always" setting enables transparent hugepage for all user memory mappings with MADV_HUGEPAGE; "never" ensures no transparent hugepage with MADV_NOHUGEPAGE; the default setting "default" makes no changes.
The "metadata_thp" option controls whether to allow jemalloc to use transparent huge pages (THP) for internal metadata. The "always" setting allows such usage. The "auto" setting uses no THP initially, but may begin to do so when metadata usage reaches certain level. The default is "disabled".