838.diamond_s
SPEC CPU®2026 Benchmark Description

Benchmark Name

838.diamond_s

Benchmark Program General Category

Metagenomics and protein sequencing

Benchmark Authors

Benjamin Buchfink <buchfink[at]gmail [dot] com>, github.com/bbuchfink

838.diamond_s was submitted to the SPEC CPU v8 Benchmark Search Program by Benjamin Buchfink.

Benchmark Description

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. DIAMOND is considered a high performance replacement for BLAST, the Basic ALignment Search Tool from the National Institutes of Health.

The key features are:

Input Description

DIAMOND's primary file inputs consist of FASTA files. FASTA is a scientific data format used to store nucleic acid sequences (such as DNA sequences) or protein sequences. It may contain multiple sequences and therefore is sometimes referred to as the FASTA database format. FASTA files can be viewed and analyzed using any DNA analysis software. These files have a suffix of .fa or .fasta, and may be gzipped since DIAMOND can read gzipped files on the fly using the zlib library. These FASTA files can also be compressed into .dmnd files which are binary archives which are specific to the DIAMOND program.

The command lines specify a database (-d) and a query (-q). At a high level, the progam is searching for the query sequences in the database. Some switches are used to control the sensitivity of the search, (--ultra-sensitive vs --sensitive vs --mid-sensitive vs --fast, etc). These switches can be discovered by running "./diamond help".

More documentation and tutorials are available at github.com/bbuchfink/diamond/wiki.

The FASTA files used to construct the SPEC CPU benchmarks were downloaded from these sources below. A savvy user can download alternative protein sequences and databases to craft their own command lines.

Output Description

The input parameters also describe the output format. In SPEC CPU, we use the following string, which is decoded below.

--outfmt 6 qseqid sseqid slen mismatch gapopen qstart qend sstart send
      
tokenDescription
qseqidQuery Sequence - id
sseqidSubject Sequence - id
slenSubject Sequence length
mismatchNumber of mismatches
gapopenNumber of gap openings
qstartStart of alignment in query
qendEnd of alignment in query
sstartStart of alignment in subject
sendEnd of alignment in subject

The benchmark output is a list of sequences showing pairwise alignments, listed in columnar format described above, one sequence per line. We verify by checking this output against the expected outputs. Sometimes we have seen floating point sensitivity in the order of these pairwise alignments, causing the lines to be printed in a different order than expected. The runcpu program mitigates this issue before the verification step by sorting the output alphabetically at the end of the run using the sort.pl script found in 838.diamond_s/data/all/input.

Programming Language

C++, C

Threading Model

The benchmark uses std::thread with a thread pool. The number of threads can be set by either the SPEC CPU 2026 config file or the runcpu command; the requested number is passed to the program using the -p option on its command line.

Known Portability Issues

GNU/Linux systems implement C++ std::thread using POSIX Threads. Although some systems automatically include the needed support, this is not universal. Surprises have been seen when changing OS versions, or libraries, or compilers; or when FDO is added; or when combining C and C++ modules. Typically, it is safest to add -pthread to all compile and link lines for all SPEC CPU benchmarks that use std::thread. Please see the $SPEC/config directory for Example config files that demonstrate how to conveniently do so.

Sources and Licensing

DIAMOND is available at github.com/bbuchfink/diamond. The version used in the SPEC CPU benchmark began with commit hash 21a32fc on March 11, 2023.

DIAMOND is distributed under the GPL-3, as seen in Diamond.license.txt. The /lib/blast and /lib/alp directories were dedicated to the Public Domain by the National Center for Biotech Info. Other included library sources are licensed under compatible terms:

SPEC added a version of the Mersenne-Twister PRNG that is licensed by its authors (Makoto Matsumoto, Takuji Nishimura, and Mutsuo Saito) under a BSD license.

spec_random_distributions.h is sampled from the LLVM project, which is distributed under the Apache License v2.0 with LLVM Exceptions.

The genomic input databases are licensed freely and available for commercial use:

References

Copyright © 2026 Standard Performance Evaluation Corporation (SPEC®)