803.sph_exa_s (SPH-EXA mini-app)
Astrophysics - Smoothed particle hydrodynamics
Authors listed in alphabetic order:
The SPH-EXA mini-app implements the smoothed particle hydrodynamics (SPH) technique, a meshless Lagrangian method commonly used for performing hydrodynamical and computational fluid dynamics simulations.
The SPH technique discretizes a fluid in a series of interpolation points (SPH particles) whose distribution follows the mass density of the fluid and their evolution relies on a weighted interpolation over close neighboring particles. SPH simulations with detailed physics calculations represent computationally-demanding applications. The SPH-EXA mini-app is derived from three parent SPH codes used in astrophysics (SPHYNX and ChaNGa) and computational fluid dynamics (SPH-flow).
A number of basic steps of any SPH calculation are included in the mini-app: from the particles' positions and masses a tree is built and walked to identify the neighbors that will be used for the remainder of the global time-step (or iteration). Such steps include the evaluation of the particles' density, acceleration, rate of change of internal energy, and all physical modules relevant to the studied scenario. Next, a new physically relevant and numerically stable time-step is found, and the properties of the particles are updated accordingly.
SPH-EXA mini-app is a modern C++ headers-only code (except for main.cpp) with no external software dependencies. The parallelism is currently expressed via OpenMP.
This mini-app can simulate a three-dimensional rotating square patch, a demanding scenario for SPH simulations due to the presence of negative pressures, which stimulate the emergence of unphysical tensile instabilities that destroy the particle system, unless corrective repulsive forces are included.
-n - number of particles to the cube (-n 100 means that
the application will run with 100 * 100 * 100 particles)
-s - number of time-steps (iterations)
-w <num> - specify how often output file shall be
writen (-w 50 means that output file will be dumped every 50
iterations)
For the testing it has been added to the source code the automatic generation of the input conditions for all given particles.
In the present setup the code performs a simulation of the evolution
of a three-dimensional rotating square patch of fluid. To do this,
additional information are set in the initialization phase
(SqPatch::init() and constants in
SqPatch):
The number of particles in the cube and the number of time-steps varies by workload:
| Workload | Particles | Time-steps | State dump interval |
|---|---|---|---|
| test | 1,000,000 (-n 100) |
0 | 1 |
| train | 125,000 (-n 50) |
7 | 5 |
| refspeed | 9,261,000 (-n 210) |
24 | 7 |
The code performs the simulation, and at the end it saves the total
energy of the system. It is sufficent to check that this value does not
change across simulations (for a fixed number of time-steps) to make
sure that the code has executed correctly, but for added assurance the
intermediate states of the calculation are also checked every
n timesteps, where n varies by workload as
shown in the table above.
In addition to the intermediate state of the calculation, the state
every 128th particle in the cube is also dumped and checked. This is a
separate CSV output file per timestep which contains positions
(x, y, z), velocities
(vx, vy, vz), smoothing length
(h), density (ro), internal energy
(u), pressure (p), speed of sound
(c) and gradient of pressure (gradPx,
gradPy, gradPz). Only every 128th particle is
printed to keep output files to a reasonable size, and because a large
variation in a single point would not have a large effect on the overall
calculation.
Each output file has a mix of very large and very small values, and all of them are checked. The tolerances used vary by file. Absolute tolerances are set to cover acceptable deltas in the very small values, while relative tolerances are used to cover acceptable deltas in the large values.
Tolerances as used by the runcpu program in the
Spec/object.pm file, and are shown here:
| Workload | Output File | Relative tolerance | Absolute tolerance |
|---|---|---|---|
| test | constants.csv |
0.506% | None |
sph_exa.out |
0.506% | None | |
dump0.csv |
0.223% | 0.0002 | |
| train | constants.csv |
1.032% | None |
sph_exa.out |
1.032% | None | |
dump0.csv |
0.00241% | 0.00007 | |
dump5.csv |
0.175% | 0.0003 | |
dump7.csv |
0.176% | 0.0003 | |
| refspeed | constants.csv |
4.3% | None |
sph_exa.out |
4.3% | None | |
dump0.csv |
None | 0.0004 | |
dump7.csv |
0.7% | 0.04 | |
dump14.csv |
0.2% | 0.04 | |
dump21.csv |
0.6% | 0.05 | |
dump24.csv |
0.3% | 0.05 |
Values were selected based on observed differences on various platforms relative to a reference run done on an x86 Linux system with GCC 14.2.0 and no optimization (-O0).
The source code that implements the algorithm is largely unchanged. The CPU 2026 version removes support for OpenACC and OpenMP target offload.
The workloads are most similar to 532.sph_exa_t. The test workload is exactly the same, while the train cube size and iterations is larger. The refspeed workload (corresponding to "ref" in 532.sph_exa_t) has the same cube size but runs for fewer iterations.
The primary difference is in output validation; the SPEChpc 2021 versions validate only the total energy output after the simulation completes. 803.sph_exa_s validates much more of the intermediate state as explained above.
C++
-DSPEC_OPENMPNone
The benchmark is licensed under the MIT license. The sources SPEC started with are from around commit 7604c824 in the SPH-EXA GitHub repo, but have been modified. The benchmark contains some changes from upstream after this point, but not all of them.
Copyright © 2026 Standard Performance Evaluation Corporation (SPEC®)