803.sph_exa_s
SPEC CPU®2026 Benchmark Description

Benchmark Name

803.sph_exa_s (SPH-EXA mini-app)

Benchmark Program General Category

Astrophysics - Smoothed particle hydrodynamics

Benchmark Authors

Authors listed in alphabetic order:

Benchmark Description

The SPH-EXA mini-app implements the smoothed particle hydrodynamics (SPH) technique, a meshless Lagrangian method commonly used for performing hydrodynamical and computational fluid dynamics simulations.

The SPH technique discretizes a fluid in a series of interpolation points (SPH particles) whose distribution follows the mass density of the fluid and their evolution relies on a weighted interpolation over close neighboring particles. SPH simulations with detailed physics calculations represent computationally-demanding applications. The SPH-EXA mini-app is derived from three parent SPH codes used in astrophysics (SPHYNX and ChaNGa) and computational fluid dynamics (SPH-flow).

A number of basic steps of any SPH calculation are included in the mini-app: from the particles' positions and masses a tree is built and walked to identify the neighbors that will be used for the remainder of the global time-step (or iteration). Such steps include the evaluation of the particles' density, acceleration, rate of change of internal energy, and all physical modules relevant to the studied scenario. Next, a new physically relevant and numerically stable time-step is found, and the properties of the particles are updated accordingly.

SPH-EXA mini-app is a modern C++ headers-only code (except for main.cpp) with no external software dependencies. The parallelism is currently expressed via OpenMP.

This mini-app can simulate a three-dimensional rotating square patch, a demanding scenario for SPH simulations due to the presence of negative pressures, which stimulate the emergence of unphysical tensile instabilities that destroy the particle system, unless corrective repulsive forces are included.

Input Description

-n - number of particles to the cube (-n 100 means that the application will run with 100 * 100 * 100 particles)

-s - number of time-steps (iterations)

-w <num> - specify how often output file shall be writen (-w 50 means that output file will be dumped every 50 iterations)

For the testing it has been added to the source code the automatic generation of the input conditions for all given particles.

In the present setup the code performs a simulation of the evolution of a three-dimensional rotating square patch of fluid. To do this, additional information are set in the initialization phase (SqPatch::init() and constants in SqPatch):

The number of particles in the cube and the number of time-steps varies by workload:

Workload Particles Time-steps State dump interval
test 1,000,000 (-n 100) 0 1
train 125,000 (-n 50) 7 5
refspeed 9,261,000 (-n 210) 24 7

Output Description

The code performs the simulation, and at the end it saves the total energy of the system. It is sufficent to check that this value does not change across simulations (for a fixed number of time-steps) to make sure that the code has executed correctly, but for added assurance the intermediate states of the calculation are also checked every n timesteps, where n varies by workload as shown in the table above.

In addition to the intermediate state of the calculation, the state every 128th particle in the cube is also dumped and checked. This is a separate CSV output file per timestep which contains positions (x, y, z), velocities (vx, vy, vz), smoothing length (h), density (ro), internal energy (u), pressure (p), speed of sound (c) and gradient of pressure (gradPx, gradPy, gradPz). Only every 128th particle is printed to keep output files to a reasonable size, and because a large variation in a single point would not have a large effect on the overall calculation.

Each output file has a mix of very large and very small values, and all of them are checked. The tolerances used vary by file. Absolute tolerances are set to cover acceptable deltas in the very small values, while relative tolerances are used to cover acceptable deltas in the large values.

Tolerances as used by the runcpu program in the Spec/object.pm file, and are shown here:

Workload Output File Relative tolerance Absolute tolerance
test constants.csv 0.506% None
sph_exa.out 0.506% None
dump0.csv 0.223% 0.0002
train constants.csv 1.032% None
sph_exa.out 1.032% None
dump0.csv 0.00241% 0.00007
dump5.csv 0.175% 0.0003
dump7.csv 0.176% 0.0003
refspeed constants.csv 4.3% None
sph_exa.out 4.3% None
dump0.csv None 0.0004
dump7.csv 0.7% 0.04
dump14.csv 0.2% 0.04
dump21.csv 0.6% 0.05
dump24.csv 0.3% 0.05

Values were selected based on observed differences on various platforms relative to a reference run done on an x86 Linux system with GCC 14.2.0 and no optimization (-O0).

Differences from 532.sph_exa_t and 632.sph_exa_s in SPEChpc™ 2021

The source code that implements the algorithm is largely unchanged. The CPU 2026 version removes support for OpenACC and OpenMP target offload.

The workloads are most similar to 532.sph_exa_t. The test workload is exactly the same, while the train cube size and iterations is larger. The refspeed workload (corresponding to "ref" in 532.sph_exa_t) has the same cube size but runs for fewer iterations.

The primary difference is in output validation; the SPEChpc 2021 versions validate only the total energy output after the simulation completes. 803.sph_exa_s validates much more of the intermediate state as explained above.

Programming Language

C++

Threading Model

Known Portability Issues

None

Sources and Licensing

The benchmark is licensed under the MIT license. The sources SPEC started with are from around commit 7604c824 in the SPH-EXA GitHub repo, but have been modified. The benchmark contains some changes from upstream after this point, but not all of them.

References

Copyright © 2026 Standard Performance Evaluation Corporation (SPEC®)