801.xz_s
Data compression
Lasse Collin <lasse.collin [at] tukaani [dot] org> is the author of XZ Utils.
Igor Pavlov wrote key portions of the compression algorithm, according to the references.
801.xz_s is based directly on Lasse Collin's XZ Utils 5.8.1, with these differences: it performs no file I/O other than reading the input; does all compression and decompression entirely in memory; and prefers generic portable routines rather than platform-specific routines. As usual for SPEC CPU®, the intent is to measure the compute-intensive portion of a real application, while minimizing IO; thereby focusing on the performance of the CPU, memory, and compiler.
Inputs for 801.xz_s are XZ-compressed files containing the data that will be compressed during the test. The reference (timed) workloads use three components: a tar archive of HTML documentation and some supporting images; a database of ClamAV malware signatures; and an input file with combined text and image data. All three have highly compressible sections and incompressible sections.
Parameters for each test are taken from the command line. In order, they are:
Command lines are constructed by the runcpu program from the contents
of the control file. Adding new workloads is quite simple;
it's just a file of data to be compressed and an entry for that file in
the control file.
Each input set is initially decompressed and the SHA-512 sum of the decompression is verified against the one specified on the command line. Then that input is duplicated (or truncated) until its size matches what was requested on the command line. It's then compressed using the XZ preset ("compression level") requested on the command line. Verification of compressed size is output, if compressed size checking is enabled. (Compressed data size may vary slightly depending on the number of threads used to do compression.) That compressed data is then decompressed and its SHA-512 sum calculated and compared to the one generated during the initial load. Doing the comparison in this way reduces the verification-related memory access for the benchmark, as well as its memory footprint.
About memory usage: The second parameter selects the size of a buffer that will be the input to the compression phase. The total virtual memory used by the benchmark will be larger. It may vary depending on compiler options, operating system, and possibly other factors.
The refspeed workload is invoked with these
parameters:
| Input file | Buffer (MiB) | Minimum size (B) | Maximum size (B) | Compression level |
|---|---|---|---|---|
cpu2006docs.tar.xz |
15000 | 2,321,296,421 | 2,523,148,284 | 4 |
cld.tar.xz |
2700 | 959,312,906 | 1,042,731,420 | 8 |
The output files provide a brief outline of what the benchmark is doing as it runs. Output sizes for each compression and decompression are printed to facilitate validation, and the results of decompression are compared with the input data to ensure that they match.
There is no longer any non-threaded workload.
557.xz_r was based on XZ 5.0.5, and 657.xz_s used the separate
pxz driver for multithreaded execution. 801.xz_s is based
on XZ 5.8.1 which has built-in multithreading support. SPEC added a shim
to use C++ std::thread instead of native pthreads or Windows
threads.
Memory consumption of the refspeed workload has increased to around 32 GiB. The size of the largest amount of data being compressed has increased from 6.5 GiB to 14.5 GiB. The compression levels remained the same, as did the input data.
The benchmark uses C++ std::thread.
GNU/Linux systems implement C++ std::thread using POSIX
Threads. Although some systems automatically include the needed support,
this is not universal. Surprises have been seen when changing OS
versions, or libraries, or compilers; or when FDO is
added; or when combining C and C++ modules. Typically, it is safest to
add -pthread to all compile and link lines for all SPEC CPU
benchmarks that use std::thread. Please see the
$SPEC/config directory for Example config files that
demonstrate how to conveniently do so.
The benchmark is based on XZ Utils 5.8.1, which is licensed under the BSD Zero Clause License.
The inttypes.h header file by Alexander Chemeris is used on Windows builds under the BSD license.
The SHA-512 routines used are in the public domain.
The input.combined.xz input file used for some workloads
contains documentation from SPEC CPU2006 and an image file which are
both covered by the SPEC CPU license, and Perl source code licensed
under the Artistic License. The
cld.tar.xz input file contains ClamAV virus database signatures which are
licensed under the GNU General Public License
version 2.
Copyright © 2026 Standard Performance Evaluation Corporation (SPEC®)