881.neutron_s
SPEC CPU®2026 Benchmark Description

Benchmark Name

881.neutron_s

Benchmark Program General Category

Physics simulation of neutron transport in nuclear reactors

Benchmark Authors

John Tramm <jtramm[at]anl [dot] gov>

881.neutron was submitted to the SPEC CPU v8 Benchmark Search Program by John Tramm.

Benchmark Description

Neutron is the union of XSBench and RSBench, two mini-apps that solve a problem with neutron transport in nuclear reactor physics. They tackle the same high level problem, albeit in two different ways. A high level summary is that XSBench retains data in memory and is memory intensive whereas RSBench computes the data, is less memory intensive but more computationally intensive. Both are relevant for modern nuclear physics simulations. A description and comparison of both algorithms is below.

XSBench

The XSBench mini-application models the most computationally intensive part of a typical Monte Carlo (MC) neutron transport nuclear reactor core transport simulation, the calculation of macroscopic neutron cross sections from point-wise data. This is a kernel which accounts for up to 85% of the total runtime of production MC applications like OpenMC. XSBench retains the essential performance-related computational conditions and tasks of fully featured reactor core MC neutron transport codes, yet at a fraction of the programming complexity of the full application. XSBench provides a much simpler and far more transparent platform for testing the algorithm on different architectures, making alterations to the code, and collecting hardware runtime performance data.

XSBench has been used for over a decade in the HPC field both to study the MC algorithm and to also serve as a memory intensive benchmark application for evaluating the performance of new system architectures. The original publication on XSBench has been cited 184 times as a result of its use as a benchmark in a wide variety of computer science compiler development and hardware design research.

RSBench

RSBench accomplishes the same computational task as XSBench, but uses a totally different way of representing cross section data. XSBench uses very large tabulated point-wise data tables that need to be searched through to assemble macroscopic cross sections. RSBench, on the other hand, utilizes a newer "multipole" representation that stores the quantum mechanical resonance data directly, leading to a several orders of magnitude of data compression at the cost of the need to mathematically evaluate cross section data on-the-fly. In effect, the methods employed in RSBench result in significant memory footprint and bandwidth savings at the cost of greatly increased usage of floating point arithmetic. While XSBench is typically a purely memory bound algorithm, the methods employed in RSBench can sometimes result in the algorithm becoming compute limited instead of bandwidth limited, which may be highly advantageous on some systems.

Compare and Contrast

XSBench and RSBench solve the same problem, but use two different methods. One example of the underlying physical data looks like this for Uranium-238:

Incident Neutron Energy

This is a very complicated function due to the resonances, which need to be accurately stored. The pointwise representation in XSBench is the simplest possible approach, where a table is stored with a very fine energy discretization and the cross section is stored at each energy point. In the above example, something like 100k energy points are stored. A cross section lookup in XSbench is therefore composed of sampling a random energy point, and then assembling data from many different nuclide pointwise tables like this. As each table is tabulated differently in energy, XSBench uses some different methods to accelerate the searching operations (e.g., unionized grid vs. logarithmic hash grid).

In RSBench, something more akin to a "curve fitting" approach is used to store the data; it is known as the windowed multipole representation. The resonances can each be described by quantum mechanics nicely, so we use the underlying physics equations to represent the resonances, combined with a background contribution from resonances farther away from a series of windows. This type of physics-based curve fit is much more efficient and/or accurate than using a more generic polynomial expansion. Thus, RSBench does not store pointwise data, but instead stores a few "residues" that describe each of the resonances (there are about 1000 resonances pictured above). A lookup then uses those residues to figure out what the cross section should be at that energy point, which in practice involves some fairly expensive FLOP operations (in particular, the Faddeeva function must be evaluated).

So good way to put it is "XSBench = more memory footprint, more memory lookups, less FLOP work" vs. "RSBench = less memory footprint, less memory lookups, more FLOP work."

The multipole method in RSBench is generally a newer approach. However, there are some complexities such that even when using multipole data, the multipole data is not available in certain regions of energy space such that a simulation will still have to do pointwise lookups a significant fraction of the time. So, both methods are important. OpenMC (a full MC app, which runs on ALCF Aurora at Argonne) uses both methods. Across the field, probably pointwise data is more commonly used as it's simpler to implement and has been around longer, but multipole is also seen as a more forward looking method as it is more likely to be able to scale on newer systems that have a system balance tiled more towards FLOP rates.

Input Description

XSBench

XSBench's input is completely described using cmdline parameters. We explain each parameter below in detail.

Usage: ./neutron xsbench [options]
Options include:
 -m [simulation method]   Simulation method (history, event)
 -t [threads]             Number of OpenMP threads to run
 -s [size]                Size of H-M Benchmark to run (small, large, XL, XXL)
 -g [gridpoints]          Number of gridpoints per nuclide (overrides -s defaults)
 -G [grid type]           Grid search type (unionized, nuclide, hash). Defaults to unionized.
 -p [particles]           Number of particle histories
 -l [lookups]             History Based: Number of Cross-section (XS) lookups per particle. Event Based: Total number of XS lookups.
 -h [hash bins]           Number of hash bins (only relevant when used with "-G hash")
 -b [binary mode]         Read or write all data structures to file. If reading, this will skip initialization phase. (read, write)
 -k [kernel ID]           Specifies which kernel to run. 0 is baseline, 1, 2, etc are optimized variants. (0 is default.)

-t [threads]
Sets the number of OpenMP threads to run. The runcpu program will automatically set the OpenMP environment variables to specify how many threads XSBench should use. So, this parameter is just for debugging.

-m [simulation method]
Sets the simulation method, either "history" or "event". These options represent the history based or event based algorithms respectively. The default is the history based method. These two methods represent different methods of parallelizing the Monte Carlo transport method. In the history based method, the central mode of parallelism is expressed over particles, which each require some number of macroscopic cross sections to be executed in series and in a dependent order. The event based method expresses its parallelism over a large pool of independent macroscopic cross section lookups that can be executed in any order without dependence. The key difference between the two methods is the dependence/independence of the macroscopic cross section loop. See the "Transport Simulation Styles" section for more information.

-g [gridpoints]
Sets the number of gridpoints per nuclide. By default, this value is set to 11,303. This corresponds to the average number of actual gridpoints per nuclide in the H-M Large model as run by OpenMC with the actual ACE ENDF cross-section data. Note that this option will override the number of default gridpoints as set by the '-s' option.

-G [grid type]
Sets the grid search type (unionized, nuclide, hash). Defaults to unionized. The unionized grid is what is typically used in Monte Carlo codes, as it offers the fastest speed. However, the increase in speed comes with a significant increase in memory usage as a union of all the separate nuclide grids must be formed and stored in memory. The "nuclide" mode uses only the basic nuclide grid data, with no unionization. This is slower as a binary search must be performed on every nuclide for each macroscopic XS lookup, rather than only once when using the unionized grid. Finally, the "hash" mode is a newer algorithm now used by many full Monte Carlo codes which offers speed nearly equivalent to the unionized energy grid method, but with only a small fraction of the memory overhead. See the "Cross Section Lookup Methods" section for more details.

-s [size]
Sets the size of the Hoogenboom-Martin reactor model. There are four options: 'small', 'large', 'XL', and 'XXL'. By default, the 'large' option is selected. The H-M size corresponds to the number of nuclides present in the fuel region. The small version has 34 fuel nuclides, whereas the large version has 321 fuel nuclides. This significantly slows down the runtime of the program as the data structures are much larger, and more lookups are required whenever a lookup occurs in a fuel material. Note that the program defaults to "Large" if no specification is made. The additional size options, "XL" and "XXL", do not directly correspond to any particular physical model. They are similar to the H-M "large" option, except the number of gridpoints per nuclide has been increased greatly. This creates an extremely large energy grid data structure (XL: 120GB, XXL: 252GB), which is unlikely to fit on a single node, but is useful for experimentation purposes on novel architectures.

-p [particles]
Sets the number of particle histories to simulate. By default, this value is set to 500,000. Users may want to increase this value if they wish to extend the runtime of XSBench, perhaps to produce more reliable performance counter data - as extending the run will decrease the percentage of runtime spent on initialization. Real MC simulations in a full application may use up to several billion particles per generation, so there is great flexibility in this variable.

-l [lookups]
Sets the number of cross-section (XS) lookups to perform per particle. By default, this value is set to 34, which represents the average number of XS lookups per particle over the course of its lifetime in a light water reactor problem. Users should only alter this value if they are trying to capture the behavior of a different type of reactor (e.g., one with a fast spectrum), where the number of lookups per history may be different.

-h [hash bins]
Sets the number of hash bins (only relevant when using the hash lookup algorithm, as selected with "-G hash"). Default is 10,000.

-b [binary mode]
This optional mode can read or write the simulation data structures to disk. Options are ("read" or "write"). This may be useful if it is necessary to minimize the initialization phase of the program, which has a non-trivial runtime. The generated file is named "XS_data.dat" and will be located in the current working directory. The same file name and location will be used when reading. Note that as the file is binary, it may not be portable between compilers and computer systems. NOTE: When running in the "read" mode, you must be running with an identical program configuration as when the file was generated. E.g., if the file was generated with the "-G nuclide" argument, subsequent runs reading from that file must use the same configuration flags. For SPEC CPU, we attempted to use this mode to reduce runtime, but the side effect was that time shifted into reading and writing the data file, which is worse for a CPU benchmark, so we chose not to employ this method.

-k [kernel]
For some of the XSBench code-bases (e.g., openmp-threading and cuda) there are several optimized variants of the main kernel. All source bases run basically the same "baseline" kernel as default. Optimized kernels can be selected at runtime with this argument. Default is "0" for the baseline, other variants are numbered 1, 2, ... etc. People interested in implementing their own optimized variants are encouraged to use this interface for convenience rather than writing over the main kernel. The baseline kernel is defined at the top of the "Simulation.c" source file, with the other variants being defined towards the end of the file after a large comment block delineation. The optimized variants are related to different ways of sorting the sampled values such that there is less thread divergence and much better cache re-usage when executing the lookup kernel on contiguous sorted elements. More details can be found in the "Optimized Kernels" section.

RSBench

RSBench's input is completely described using cmdline parameters. We explain each parameter below in detail.

    Usage: ./neutron rsbench [options]
    Options include:
     -t [threads]            Number of OpenMP threads to run
     -m [simulation method]  Simulation method (history, event)
     -s [size]               Size of H-M Benchmark to run (small, large)
     -l [lookups]            Number of Cross-section (XS) lookups per particle history
     -p [particles]          Number of particle histories
     -P [poles]              Average Number of Poles per Nuclide
     -W [poles]              Average Number of Windows per Nuclide
     -d                      Disables Temperature Dependence (Doppler Broadening)
     -k [kernel]             Enable an alternative computation kernel

-t [threads]
Sets the number of OpenMP threads to run. The runcpu program will automatically set the OpenMP environment variables to specify how many threads RSBench should use. So, this parameter is just for debugging.

-m [simulation method]
Sets the simulation method, either "history" or "event". These options represent the history based or event based algorithms respectively. The default is the history based method. These two methods represent different methods of parallelizing the Monte Carlo transport method. In the history based method, the central mode of parallelism is expressed over particles, which each require some number of macroscopic cross sections to be executed in series and in a dependent order. The event based method expresses its parallelism over a large pool of independent macroscopic cross section lookups that can be executed in any order without dependence. They key difference between the two methods is the dependence/independence of the macroscopic cross section loop. See the "Transport Simulation Styles" section for more information.

-p [particles]
Sets the number of particle histories to simulate. By default, this value is set to 500,000. Users may want to increase this value if they wish to extend the runtime of RSBench, perhaps to produce more reliable performance counter data - as extending the run will decrease the percentage of runtime spent on initialization. Real MC simulations in a full application may use up to several billion particles per generation, so there is great flexibility in this variable.

-P [poles]
Sets the average number of poles per nuclide.

-w [windows]
Sets the number of windows per nuclide.

-d
This flag disables Doppler broadening, in effect making it single temperature 0K simulation. Doppler broadening is enabled by default. While Doppler broadening is likely to be used in most cases in a full application like OpenMC, in may be useful to disable in some cases so as to compare to traditional single temperature lookup algorithms.

-k [kernel]
For some of the RSBench code-bases (e.g., openmp-threading and cuda) there are several optimized variants of the main kernel. All source bases run basically the same "baseline" kernel as default. Optimized kernels can be selected at runtime with this argument. Default is "0" for the baseline, other variants are numbered 1, 2, ... etc. People interested in implementing their own optimized variants are encouraged to use this interface for convenience rather than writing over the main kernel. The baseline kernel is defined at the top of the "Simulation.c" source file, with the other variants being defined towards the end of the file after a large comment block delineation. The optimized variants are related to different ways of sorting the sampled values such that there is less thread divergence and much better cache re-usage when executing the lookup kernel on contiguous sorted elements. More details can be found in the "Optimized Kernels" section.

Output Description

The XSBench and RSBench applications print out results along with a verification step. For the benchmark to pass, this output must match exactly to the expected output. More info on the verification is given below.

Verification Support

Both XSBench and RSBench generate a hash of the results at the end of the simulation, and display it once the code has completed executing. This hash can then be verified against hashes that other versions or configurations of the code generate. For instance, running XSBench with 4 threads vs 8 threads (on a machine that supports that configuration) will generate the same hash value. Changing the model / run parameters is expected to generate a totally different hash number (i.e., increasing the number of particles, number of gridpoints, etc, will result in different hashes). Changing the simulation mode (history or event) will generate different hashes. However, changing the type of lookup performed (e.g., nuclide, unionized, or hash) should result in the same hash being generated. Since the benchmark cmdlines are fixed, this facilitates verification across systems.

In order to additionally help improve verification and debugging, SPEC samples the maximum values for particle transport and prints them to stdout. The sampling may be a single value or a summation of values depending on the algorithm being used. The sample size varies by the input size. The sampling ensures intermediate results are correct without having to print all values which can be very large.

Programming Language

Threading Model

The benchmark uses OpenMP threading.

Known Portability Issues

None. Program is only C code, and the CUDA options have been stripped away for SPEC CPU.

Sources and Licensing

XSBench sources are available at github.com/ANL-CESAR/XSBench. The SPEC CPU code started with the snapshot at git hash 6e2beb7 from that repository.

RSBench sources are available at github.com/ANL-CESAR/RSBench. The SPEC CPU code started with the snapshot at git hash 7848d0e from that repository.

Both XSBench and RSBench are distributed under the MIT license.

References

More info on the algorithms and theory can be found locally: theory.html
Detailed theory paper: XSBench_Theory.pdf
XSBench paper: XSBench - The Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis, J.R.Tramm, A.R.Siegel, T.Islam, M.Schulz, PHYSOR 2014 - The Role of Reactor Physics toward a Sustainable Future, Kyoto, Japan, 2014. P5064-0114.pdf
RSBench paper: Performance Analysis of a Reduced Data Movement Algorithm for Neutron Cross Section Data in Monte Carlo Simulations, J.R.Tramm, A.R.Siegel, B.Forget, C.Josey, EASC 2014 - Solving Software Challenges for Exascale, Stockholm, Sweden, 2014. doi.org/10.1007/978-3-319-15976-8_3
The Uranium-238 energy plot taken from Brookhaven National Laboratory's SIGMA database. More information is available at 238U JvsE and 238U Gamma/J.

881.neutron_s SPEC CPU®2026 Benchmark Description

Benchmark Name

Benchmark Program General Category

Benchmark Authors

Benchmark Description

XSBench

RSBench

Compare and Contrast

Input Description

XSBench

RSBench

Output Description

Verification Support

Programming Language

Threading Model

Known Portability Issues

Sources and Licensing

References

881.neutron_s
SPEC CPU®2026 Benchmark Description