April 27, 2000
This document specifies how the benchmarks in the SPEC JVM Client98 suite are to be run for measuring and publicly reporting performance results. These rules are intended to ensure that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEC license agreement, all publicly disclosed results must adhere to these Run and Reporting Rules.
SPEC intends that this benchmark suite be applicable to standalone Java client computers, either with disk (e.g., PC, workstation) or without disk (e.g., network computer) executing programs in an ordinary Java platform environment. This suite is intended to measure performance of Java clients, not that of Java servers nor embedded systems even though the benchmarks will run in those environments. The benchmarks measure the speed of execution by the Java Virtual Machine of Java byte codes which is fundamental to overall Java performance in many application environments.
In addition to basic byte code execution, this benchmark suite requires graphics, networking, and I/O, and these functions will influence benchmark performance, in some cases significantly so. However, SPEC intends that these functions will not ordinarily dominate benchmark performance, and therefore these benchmarks should not be taken as representative of application environments which are dominated by these functions.
These benchmarks require a version 1.1 Java Virtual Machine, or later. They are applicable both to systems using 64-bit intermediate floating point values, and to those using 80-bit values.
These benchmarks do not provide a comparison of Java performance to that of C or C++. SPEC recognizes that a number of environments have been created with the intent to provide improved performance by performing some work statically prior to program execution instead of dynamically during program execution. These benchmarks are not intended to compare the dynamic Java platform against such statically compiled programs.
This document specifies how the benchmarks in the SPEC JVM Client98 suite are to be run for measuring and publicly reporting performance results. The general philosophy behind the rules for running the SPEC JVM Client98 benchmark is to ensure that an independent party can reproduce the reported results. The SPEC benchmark tools as provided must be used to run the benchmarks.
SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. However, with the list below, SPEC wants to increase awareness of implementors and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking. To ensure that results are relevant to end-users, SPEC expects that the hardware and software implementations used for the running the SPEC benchmarks adhere to following conventions:
Hardware and software used to run the SPEC JVM Client98 benchmarks must provide a suitable environment for running typical Java programs.
Optimizations must execute correctly for a class of programs, where the class of programs must be larger than a single SPEC benchmark or SPEC benchmark suite. This also applies to "assertion flags" that may be used.
No runtime features which are normally enabled for execution of ordinary Java programs (e.g. array bound checking, dynamic class loading) may be disabled for the purposes of the benchmark. A SecurityManager must be enabled which at a minimum performs byte code verification (e.g. is able to throw VerifyError) on all loaded class files.
A tested system must be able to handle classes and assertions at run time that are outside the scope of what the optimizer saw at compile time.
Optimizations must improve performance for more programs than a single SPEC benchmark or SPEC benchmark suite.
The vendor encourages general use of the implementation.
The implementation is generally available, documented and supported by the providing vendor.
In the case where it appears that the above guidelines have not been followed, SPEC may investigate such a claim and request that the offending optimization (e.g. a SPEC-benchmark specific pattern matching) be backed off and the results resubmitted. Or, SPEC may request that the vendor correct the deficiency (e.g. make the optimization more general purpose or correct problems with code generation) before submitting results based on the optimization.
SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPEC JVM Client98 as deemed necessary to preserve the goal of fair benchmarking. SPEC with notify members and licencees whenever it makes changes to the suite and will rename the metrics. In the event that a workload is removed, SPEC reserves the right to republish in summary form "adapted" results for previously published systems, converted to the new metric. In the case of other changes, a republication may necessitate retesting and may require support from the original test sponsor.
Relevant standards cited in these run rules are current as of the date of publication. Changes or updates to these referenced documents, or other reasons may necessitate amendment of these run rules. The current run rules will be available at the SPEC web site at http://www.spec.org.
Tested systems must provide an environment suitable for running typical Java Version 1.1 programs and must be generally available for that purpose. Any tested system must include an implementation of the Java (tm) Virtual Machine as described by the following references, or as amended by SPEC for later Java versions:
Java Virtual Machine Specification (ISBN: 020163452X)
The system must also include an implementation of those packages and their classes that are referenced by this suite as described within the following references:
Java Class Libraries: Vol I Java.Io, Java.Lang, Java.Math, Java.Net, Java.Security, Java.Text, Java.Util (The Java Series , Vol 1) (ISBN: 0201310023) and
Java Class Libraries: Second Edition Vol II Java.Applet, Java.Awt, Java.Beans (ISBN: 0201310031)
The SPEC JVM Client benchmark suite is based on source code that conforms to the Java Language Specification:
Java Language Specification (ISBN: 0201634511) (Including Appendix D: Changes for Java 1.1)
The SPEC JVM Client98 benchmark suite runs as an applet on a client system (Network Computer, Personal Computer, or Workstation). The SPEC JVM Client98 software is installed onto the file system of a web server. The benchmark home page, index.html, is loaded as a URL by a web browser on the client, and provides instructions and documentation for running the benchmarks, as well as a link to the page containing the SPEC JVM Client98 benchmark applet.
Reportable results must be run with the SPEC tools as an applet from a web server either running on the same machine as the benchmarks, or running on another machine. Benchmark classes must be loaded from that web server; e.g., ensure that the CLASSPATH environment variable, or equivalent, is not set so that benchmark classes will be loaded from a local filesystem.
SPEC requires the use of a of single file system on a web server system to contain the directory tree for the suite being run. SPEC allows any type of file system (disk-based, memory-based, NFS, DFS, etc.) to be used. The type of file system must be disclosed in reported results.
The classes of the benchmark suite are provided as individual class files rather than being collected in a JAR archive. The benchmark must be run using these individual class files, not a JAR or other class file archive. SPEC recognizes that JAR archives are commonly used to speed execution particularly of Java applets loaded over the network. However, by not using JAR we ensure an apples-to-apples comparison without potential second order performance impacts from increased memory utilization.
There are a number of parameters that control the operation of the SPEC benchmark tools which may be set by property files, command line arguments, HTML applet parameters, and by GUI controls. The use of all these controls is explained in the benchmark documentation. The properties in the file "props/spec" may not be changed from the values as provided by SPEC. The properties in the file "props/user" may be set to any desired value. In particular a reportable result must:
Select benchmark size 100
Select benchmark group "All"
Run benchmarks in an "autorun" sequence
Leave benchmark harness cache turned off. E.g., spec.initial.cache=false
Set spec.initial.autodelay to no more than 1000 ms. (This limitation is enforced by the benchmark tools.)
All benchmark settings must be reported. The benchmark tools provide for such reporting automatically.
Tuning may be accomplished through JVM command line flags, system configuration settings, web browser configuration settings, initialization files, environment variables, or other means. Tuning options may affect a JIT compiler, any other part of the JVM, or an underlying operating system, if any. The following rules apply to tuning for the SPEC JVM Client98 benchmark suite.
SPEC does not restrict the number of tuning options that may be used. However all options used must be reported.
No source file, variable, class, or method name may be used within a tuning option.
Options which substitute pre-computed (e.g. library-based) methods for methods defined in the benchmark on the basis of the method's name are not allowed. However, classes of the Java API may be executed either as Java byte codes or as native code, as determined by the Java Virtual Machine under test.
The optimization options used are expected to be safe, and it is expected that vendors would endorse the general use of these options by customers who seek to achieve good application performance.
All benchmark execution, including the validations steps, contributing to a particular result page must occur continuously, executing under the SPEC GUI tool harness.
The same JVM and same set of options is used for all benchmarks within the benchmark suite. All options must be applied in the same order for all benchmarks. Note that this will normally be the case for benchmarks run, as required, from the SPEC tools.
Feedback directed optimization is allowed on the problem size 100 input data set during the timed execution interval as part of a normal JIT or adaptive compilation process. No feedback directed optimization is allowed which requires additional steps, user interaction, or is outside of the timed execution interval. Note that in an autorun sequence, smaller untimed executions of each benchmark are interspersed between the timed executions.
Assertion options may NOT be used. An assertion options is one that supplies semantic information that the JVM did not derive from the class files of the benchmark. With an assertion option, the programmer asserts that the program has certain "nice" properties that allow the JVM to apply more aggressive optimization techniques (for example, that certain classes with non-final methods are never subclassed, or that certain arrays are never over-indexed). The problem is that there can be legal standard-conforming programs where such a property does not hold. These programs could crash or give incorrect results if an assertion option is used. This is the reason why such options are sometimes also called "unsafe options". Note that a standard conforming JVM should not accept any such assertion options.
Class (byte code) verification must be enabled.
Pre-compiled classes may not be used in a measurement run of the benchmark. For instance, benchmark class files may not be compiled statically prior to executing the SPEC tool harness. Also, classes compiled by a JIT compiler prior to starting a benchmark run may not be used; this could be the case if you ran some or all of the benchmarks and then started an autorun sequence. The SPEC tool harness will normally prevent this from occurring.
Results will be reported in three categories of physical memory sizes. These categories will be printed on the reporting page, and the SPEC web site will group them by these categories. "Physical" memory size is taken to include a total memory size set in software for the OS to use for all purposes. This must be the same or conservative compared to physically pulling out memory from the machine. The categories of memory size reported are:
0 through 48MB
from 48MB through 256 MB
greater than 256MB
Proper use of the SPEC benchmark tools and reporting page generation program will take care of the production of a correctly formatted reporting page. The test sponsor is responsible for the content of all information, and for reviewing the page to ensure that all information is correct. If there is any problem with the SPEC tools in preparing a correct reporting page, the test sponsor can contact SPEC to possibly obtain a bug fix.
The elapsed time in seconds of each of the benchmarks for the system under test is divided into the reference time to give a SPEC ratio indicating the speed of the tested machine compared to times provided by SPEC for the reference machine. The composite metric is calculated as a geometric mean of the individual ratios. All runs of each benchmark when using the SPEC tools are required to have validated correctly.
SPEC derived the reference times by executing the benchmarks on a reference machine. These reference times are now fixed, and are used by the SPEC reporting software for all calculations of SPEC ratios. The reference machine is:
The metrics are calculated as the geometric means of the SPEC ratios. All benchmark programs are weighted equally with the exception of _200_check whose execution time is not reported nor used in metric calculation. The benchmark tools run each benchmark a number of times according to criteria set by the benchmarker. The SPECjvm98 metric is calculated from the best ratios, and the SPECjvm_base98 metric is calculated from the worst ratios.
The table of results shows SPEC ratios for each of the benchmarks in the suite, and an overall geometric mean. Each benchmark is executed at least twice, and both the worst and best of all runs are reported here.
A bar chart depicts the SPEC ratios of each benchmark, worst and best, and also the geometric means of the worst ratios and of the best ratios.
Hardware Vendor - Primary vendor of the hardware system tested
Vendor URL
Model
Processor
MHz
Number of Processors
Memory (MB) The memory size is highlighted on the reporting page, and the system is classified according to three SPEC-defined ranges of memory size. See "Memory" above.
Primary cache
Secondary cache
Other cache
Memory configuration - Any additional information about the memory which significantly influences performance. E.g., type of memory, interleaving, programmed wait states.
Disks
Non-volatile storage - Any such storage other than disk, e.g. flash RAM.
Other hardware
Network
Client h/w available - See note on general availability dates below.
Software Vendor - Primary vendor of the software system tested
Vendor URL
JVM Version - Vendor and version of the Java virtual machine
JVM available - See note on general availability dates below.
JVM Memory (MB) - Maximum heap allocation
OS Version
Client OS available - See note on general availability dates below.
Filesystem
System state
Other software
Other client s/w available - See note on general availability dates below.
Hardware Vendor - Primary vendor of the hardware system tested
Model
Processor
MHz
Number of Processors
Memory (MB)
Disks
Other hardware
Server h/w available - See note on general availability dates below.
OS Version
Server OS available - See note on general availability dates below.
Web server version
Web server available - See note on general availability dates below.
Filesystem
Other software
Other server s/w available - See note on general availability dates below.
Tested by - organization which performed the test and is reporting the results
SPEC license - SPEC license number of that organization
Test location - Site where test was performed
Test date - date the test was performed , month and year
The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be available within 3 months of the date of publication.
If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the released system. If the sponsor later finds the performance of the released system to be 5% lower than that reported for the pre-release system, then the sponsor is requested to report a corrected test result.
This section is used to document any other contitions which are necessary for an independent party to reproduce the measured test results. E.g.:
Any additional information required to reproduce the results, which do not fit in the space allocated above must be listed here.
System state: single or multi-user
System tuning parameters other than default
Process tuning parameters other than default
Background load, if any
Additional information such as JVM options may be listed
Critical customer-identifiable firmware or option versions such as network and disk controllers
If the configuration is large and complex, added information should be supplied either by a separate drawing of the configuration or by a detailed written description which is adequate to describe the system to a person who did not originally configure it.
If a test result is invalid for any reason this section of the reporting page lists the reason(s) why it is not valid. An invalid result page will also have "INVALID RESULT" stamped across the page background, and no geometric mean is calculated or reported.
A table lists details about each execution of each benchmark:
Benchmark name
Reference time
Run sequence number
Run elapsed time in seconds
SPEC ratio
JVM heap in use at start of run
JVM heap in use at end of run
Total JVM heap size at start of run
Total JVM heap size at end of run
A bar graph depicts the memory in use and total memory at the end of each individual benchmark run.
SPEC requires a full disclosure of results and configuration details sufficient to reproduce the results. A full disclosure report of all information described in these rules must be available on request within two weeks of any public disclosure of SPEC JVM Client98 results. Acceptable means of providing disclosures include:
Publication of the benchmark result on the SPEC web site. Publication of test results through SPEC is encouraged but not required, and may provide a convenient means of making available the necessary full disclosure reports.
Publication of the full disclosure information on the licensee's web site. The URL of the disclosure should be either listed in any publication of the benchmark result or must be provided on request to anyone who asks.
Publication of the full disclosure information by a magazine or other third party web site. The URL of the disclosure should be either listed in any publication of the benchmark result or must be provided on request to anyone who asks.
Sending the full disclosure information on request to anyone who asks.
A full disclosure of results will include:
The components of the results page, as described in these reporting rules.
The benchmarker's property files and any supplemental property files needed to generate the results.
A "flags" definition disclosure, explaining the meaning of any tuning options used.
All results disclosures to be submitted to the SPEC web site should be sent to SPEC via email.
If for some reason, the test sponsor cannot run the benchmarks as specified in these rules, the test sponsor can seek SPEC approval for performance-neutral alternatives. No publication of SPEC metrics may be made without such approval, except for research use as described below.
Other modes of running the benchmarks will be provided for research purposes. Any publication of performance information derived using the SPEC JVM Client98 benchmarks must credit SPEC as the source of the benchmarks. Any publication which is not compliant with all of the run rules must not represent directly or indirectly that the result is a SPECjvm98 metric. It must explicity state that the result is not comparable with a SPECjvm98 metric.
In any such use, only the elapsed time of individual benchmarks may be reported. SPEC ratios and geometric means may not be calculated.Elapsed times on different systems and test conditions may be compared with one another. Test conditions must be described sufficiently to allow a third party to reproduce the results. No derivative metrics may be devised based on these benchmarks.
SPECjvm98 metrics may be estimated. All estimates must be clearly identified as such. Licensees are encouraged to give a rationale or methodology for any estimates, and to publish actual SPECjvm98 metrics as soon as possible.
Source code for some of the benchmarks is provided with so that people can better understand what the benchmarks are doing, and to facilitate academic research. However, no compilation of Java source code to class files is required to run the benchmarks, and no such compilation is allowed for reported results. The SPEC tools to run the benchmarks are also provided as Java class files. No compilation of the SPEC tools is required or allowed.
Initial publication with benchmark release.
Subcommittee votes on general availability window runrule change to go into effect on 04/27/2000.
Availability dates for the products measured are changing from a 6 month window to a 3 month window. Whereas, previously, the run rules said:
The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be available within 6 months of the date of test.
as of 04/27/2000 they will say:
The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be available within 3 months of the date of publication.
The reason for this is that SPEC is attempting to make these rules more consistent over a larger set of benchmarks.