706.stockfish

Artificial Intelligence (alpha-beta tree search with statistical heuristics, pattern recognition and low-precision neural network inference)

Benchmark Authors

Tord Romstad, Marco Costalba, Joona Kiiski, Gary Linscott and others - see the file 706.stockfish_r/Docs/AUTHORS

706.stockfish_r was submitted to the SPEC CPU v8 Benchmark Search Program by Gian-Carlo Pascutto <gcp [at] sjeng [dot] org> who packaged and modified the program for use by SPEC.

Benchmark Description

706.stockfish_r is based on the Stockfish chess program. Stockfish itself is based on the original Glaurung chess program by Tord Romstand, enhanced with thousands of contributions from computer chess enthusiasts.

The program attempts to find the best move via a combination of alpha-beta tree searching, advanced move ordering, positional evaluation and heuristic forward pruning and tree extensions and reductions. Practically, it will explore the tree of variations resulting from a given position to a given base depth, extending interesting variations but reducing or entirely discarding doubtful or irrelevant ones. From this tree the optimal line of play for both players ("principal variation") is determined, as well as a score reflecting the balance of power between the two.

For evaluating the positions, it makes a quick estimate of the imbalance in the position. If one side appears to be very strongly winning, it will use a classical hand crafted evaluation ("HCE") with pattern matching. If the position is close, it will use a neural network based evaluation. The used neural network architecture ("NNUE") is peculiar in that it follows a specific construction allowing a huge upper layer to be efficiently updated incrementally, leaving only some smaller lower layers that must be fully recomputed. The majority of computation is done using low-precision fixed point arithmetic, providing further speedups. Because of the high speed with which the neural network can be computed, the searching engine can still maintain high, multi-million nodes per second search rates which provide high tactical strength. The benchmark offers both NNUE and Classical evaluation workloads, running on a variety of initial game positions.

The SPECrate (706.stockfish_r) and SPECspeed (806.stockfish_s) versions of the benchmark differ in their search depth and in the amount of memory that they are allowed to use, as described below. Having more memory for the SPECspeed version allows information to be stored about more visited positions, such as the positional evaluation, the best move, and any prior search bounds, which speed up the program when revisiting the position. Because having more information about the position might cause the search to be extended, it will not necessarily make the time to reach a certain depth faster, though the resulting program will be, on average, stronger.

Input Description

The `control` file

The benchmark is started with a series of arguments that are found in the control file. For example, here are the arguments for the two (short, not timed) test workloads:

  bench 16 1 15 spec_test.fen depth nnue
  bench 16 1 15 spec_test.fen depth classical

The arguments mean:

bench causes the program to run a series of commands from a list.
Target memory size for the Hashtable, in MB:
16 MB for the test workload shown above.
1600 MB for the train workload.
1600 MB for the refrate workload (706.stockfish_r)
48000 MB for the refspeed workload (extra input, not part of benchmark)
Number of threads: Always 1 for the SPEC CPU workloads as discussed below.
Limit: In this example: 15.
Name of a textfile containing the list of chess positions to consider, using the standard Forsyth-Edwards Notation (FEN).
Identifier for what type of Limit: For SPEC CPU, the type is always "depth"; for other possibilities, see the source module benchmark.cpp.
Evaluation algorithm: nnue to allow application of the neural network code, or classical to invoke the classical evaluation code.

Comparing Evaluation Methods

Some testers may wish to compare performance of the classical evaluation method vs. the neural network method. If that is of interest, you should examine the structure of the control file carefully. For example, data/refrate/input/control has:

bench 1600 1 26 spec_ref_pos_1to6.fen  depth classical
bench 1600 1 26 spec_ref_pos_1to6.fen  depth nnue
bench 1600 1 26 spec_ref_pos_7to11.fen depth nnue

which means that the SPECrate version (706.stockfish_r) will be invoked three times:

Evaluate positions 1 through 6 with the classical method.
Evaluate positions 1 through 6 with the neural network method.
Evaluate positions 7 through 11 with the neural network method.

If you search your result log file for the phrase "Workload elapsed time" the times for the above are reported individually; or you can cd to the run directory and use the printkids.pl utility.

The benchmark performs more work using the neural network method than the classical method because at the time that SPEC developed the benchmark, the nnue method was faster, and it was considered desirable to therefore give it additional problems to solve.

Positions

The workloads use textfiles with lists of chess positions to be evaluated. For the refrate benchmark workloads, the positions are taken from famous games recognizable to chess aficionados. The refspeed (alternate) workloads also use these same games, but with greater depth and memory usage. The games are listed below:

2rq1rk1/4bppp/pn1pb3/Np2p1Pn/4P3/2N1BP2/PPPQ3P/1K1R1B1R w - - 0 15

A sharp and topical position in the Najdorf Sicilian that was contested (multiple times) in the Sinquefield Cup 2021, among others between Fabiano Caruana and Maxime Vachier-Lagrave. Fabiano played the lesser known a3.

rnb1k2r/1p1n1pp1/p3p2p/2bq4/2BNN3/2P1Q1B1/6PP/3RK2R b Kkq - 0 18

A new idea in the Poisoned Pawn Najdorf Sicilian that was uncorked by Fabiano Caruana in order to beat Maxime Vachier-Lagrave in the Candidates Tournament earlier in 2021. Precise play is required by black now after Bc4. Maxime didn't find (all of) it.

rn1q1rk1/pp3ppp/2p2n2/4pb2/PbNP4/2N2P2/1P2PKPP/R1BQ1B1R w - - 0 10

Ding Liren vs Fabiano Caruana in the 2020 Candidates Tournament (that was interrupted and resumed later). Fabiano played a novelty with e5 but Ding successfully defended and won.

r2qr1k1/pppn1pbp/2np2p1/8/2P1P3/2N1BB2/PP1Q1PPP/3R1RK1 b - - 0 13

Radoslaw Wojtaszek against Fabiano Caruana. Caruana played yet another astounding novelty with Bxc3 which surprised many commentators and fans, but went on to demonstrate the soundness of the idea by winning.

r5k1/p5b1/1p2R2p/6p1/4p3/P1P1P1P1/P5P1/3B1K2 b - - 0 29

An opposite bishops endgame that one would think black should definitely draw, but Magnus Carlsen went on to outplay Peter Svidler in typical style here.

5k2/8/5pK1/3B1P1P/3n4/8/3b4/8 b - - 0 67

Magnus Carlsen doesn't believe in fortresses. In this game from the World Championship in London, 2018, he just blundered with Kg6 and Fabiano Caruana has a win now. (He didn't find it)

4r1k1/5p1p/p1pb1nq1/1P4p1/3P4/1BPb1PP1/1P1N1Q1P/R1B3K1 b - - 0 23

We're back in 2004 now, with Vladimir Kramnik playing some home preparation he analyzed with 2004 era software on 2004 era hardware (and presumably not too much patience). 2004 World Championship challenger Peter Leko finds the refutation over the board.

8/8/p2k1p2/1p1p3p/1P1P3p/P3NPP1/5K2/1b6 w - - 0 47

Deep Blue is beating Garry Kasparov and plays axb5 here. Kasparov cannot believe the computer doesn't "fall" for the materialistic Qb6 giving him counter-play and suspects foul play by the Deep Blue team.

r1r1q1k1/6p1/p2b1p1p/1p1PpP2/PPp5/2P4P/R1B2QP1/R5K1 w - - 0 36

Karpov versus Kasparov, 1984 World Championship. Kasparov blunders and overlooks a winning continuation for white deep in the endgame. Karpov found it and sent Kasparov to the brink of defeat (but never over it).

q3nrk1/4bppp/3p4/r3nPP1/4P2P/NpQ1B3/1P4B1/1K1R3R b - - 0 24

Sergey Karjakin versus Viswanathan Anand, in the Wijk aan Zee tournament in The Netherlands, 2006. Again a sharp Najdorf. Anand comes up with the astounding Nc7.

1k1r1br1/1bq2p1p/p1n1p3/5pB1/PppPP3/5N1P/1PB2PP1/2RQR1K1 b - - 3 20

After outplaying opponent Ding Liren in the second game of the 2023 World Chess Championship match, Ian Nepomniachtchi sacrifices the exchange to get a dominating position and the match lead.

Output Description

The output consists, per position, of the output from the tree searching module as it progresses and investigates the search tree deeper and deeper. The format is defined by the Universal Chess Interface specification.

Typically, it will list the current nominal search depth, the maximum depth reached by the search, the current estimate of the worth of the position in centipawns, the amount of positions searched and how full the hashtable for remembering positions has gotten.

The output is validated against a SPEC-supplied set of expected outputs. Note that the expected output differs between the rate and speed versions.

Programming Language

Threading Model

Although C++ threads (std::thread) are used, only one thread is active at any given time. The parallelization algorithms used by Stockfish (and Deep Sjeng, and Leela Zero, and every top-of-the-line game tree searching program) are not deterministic and cannot reasonably be made deterministic. The order of operations in multi-threaded versions will vary, which yields a different state, and the program takes a different path. This effect sometimes surprises end users of parallel versions, who get a different "best" move from the program from run to run. Unfortunately the divergence isn't bounded. The only option for a deterministic benchmark is to run the program single-threaded.

Known Portability Issues

GNU/Linux systems implement C++ std::thread using POSIX Threads. Although some systems automatically include the needed support, this is not universal. Surprises have been seen when changing OS versions, or libraries, or compilers; or when FDO is added; or when combining C and C++ modules. Typically, it is safest to add -pthread to all compile and link lines for all SPEC CPU benchmarks that use std::thread. Please see the $SPEC/config directory for Example config files that demonstrate how to conveniently do so.

706.stockfish_r does not directly call POSIX threads, with one exception: on some platforms, threads are created with a stack size of 512 KB, which is too low and may lead to crashes. If the flag SPEC_ADJUST_STACK is defined, thread creation is done using lower-level pthreads calls so that the stack can be increased. See: github.com/official-stockfish/Stockfish/issues/2027, 2033, and 2035 ,and the comments in src/thread_posix.h.

Sources and Licensing

The SPEC benchmark was branched from the Stockfish 15 release. Support for endgame databases, embedding the neural network, and tuning the evaluation was removed, all output that is time-dependent was removed, and platform, compiler, or architecture specific micro-optimizations were disabled, removed, or replaced by portable C++ code.

The neural network file was taken from the Stockfish Neural Network Repository at tests.stockfishchess.org/nns, which distributes and shares the files through a CC0 public domain dedication (local copy).

706.stockfish_r
SPEC CPU®2026 Benchmark Description

Benchmark Name

Benchmark Program General Category

Benchmark Authors

Benchmark Description

Input Description

The `control` file

Comparing Evaluation Methods

Positions

Output Description

Programming Language

Threading Model

Known Portability Issues

Sources and Licensing

References

706.stockfish_r SPEC CPU®2026 Benchmark Description

Benchmark Name

Benchmark Program General Category

Benchmark Authors

Benchmark Description

Input Description

The control file

Comparing Evaluation Methods

Positions

Output Description

Programming Language

Threading Model

Known Portability Issues

Sources and Licensing

References

706.stockfish_r
SPEC CPU®2026 Benchmark Description

The `control` file