SPECvirt_sc2010 Release 1.0 Run and Reporting Rules

Version 1.02

1.0 Introduction
    1.1 Philosophy
    1.2 Fair Use of SPECvirt_sc2010 Results
    1.3 Research and Academic Usage
    1.4 Caveat
    1.5 Definitions
2.0 Running the SPECvirt_sc2010 Benchmark
    2.1 Environment
        2.1.1 Testbed Configuration
        2.1.2 System Under Test (SUT)
        2.1.3 Power
    2.2 Workload VMs
    2.3 Measurement
        2.3.1 Quality of Service
        2.3.2 Benchmark Parameters
        2.3.3 Running SPECvirt_sc2010 Workloads
        2.3.4 Power Measurement
        2.3.5 Client Polling Requirements
3.0 Reporting Results
    3.1 Metrics And Reference Format
    3.2 Testbed Configuration
        3.2.1 SUT Hardware
        3.2.1.1 SUT Stable Storage
        3.2.2 SUT Software
        3.2.3 Network Configuration
        3.2.4 Clients
        3.2.5 General Availability Dates
        3.2.6 Rules on the Use of Open Source Applications
        3.2.7 Test Sponsor
        3.2.8 Notes
4.0 Submission Requirements for SPECvirt_sc2010
    4.1 SUT Configuration Collection
    4.2 Guest Configuration Collection
    4.3 Client Configuration Collection
    4.4 Configuration Collection Archive Format
5.0 The SPECvirt_sc2010 Benchmark Kit
Appendix A. Run Rules References

 

1.0 Introduction

SPECvirt_sc2010 is the first generation SPEC benchmark for evaluating the performance of datacenter servers used for virtualized server consolidation.  The benchmark also provides options for measuring and reporting power and performance/power metrics. This document specifies how SPECvirt_sc2010 is to be run for measuring and publicly reporting performance and power results. These rules abide by the norms laid down by the SPEC Virtualization Subcommittee and approved by the SPEC Open Systems Steering Committee. This ensures that results generated with this suite are meaningful, comparable to other generated results, and are repeatable with sufficient documentation covering factors pertinent to duplicating the results.

Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.

1.1 Philosophy

The general philosophy behind the rules of SPECvirt_sc2010 is to ensure that an independent party can reproduce the reported results.

The following attributes are expected:

Furthermore, SPEC expects that any public use of results from this benchmark suite shall be for System Under Test (SUT) and configurations that are appropriate for public consumption and comparison. Thus, it is also expected that:

SPEC requires that any public use of results from this benchmark follow the SPEC Fair Use Rule and those specific to this benchmark (see the Fair Use section below).  In the case where it appears that these guidelines have not been adhered to, SPEC may investigate and request that the published material be corrected.

1.2 Fair Use of SPECvirt_sc2010 Results

Consistency and fairness are guiding principles for SPEC. To help assure that these principles are met, any organization or individual who makes public use of SPEC benchmark results must do so in accordance with the SPEC Fair Use Rule, as posted at http://www.spec.org/fairuse.html. Fair-use clauses specific to SPECvirt_sc2010 are covered in http://www.spec.org/fairuse.html#SPECvirt_sc2010.

1.3 Research and Academic Usage

Please consult the SPEC FairUse Rule on Research and Academic Usage at http://www.spec.org/fairuse.html#Academic.

1.4 Caveat

SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECvirt_sc2010 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees whenever it makes changes to this document and will rename the metrics if the results are no longer comparable.

Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URLs may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC Web site at http://www.spec.org/virt_sc2010. SPEC will notify members and licensees whenever it makes changes to the suite.

1.5 Definitions

In a virtualized environment, the definitions of commonly-used terms can have multiple or different meanings. To avoid ambiguity, this section attempts to define terms that are used throughout this document:

For further definition or explanation of workload-specific terms, refer to the respective documents of the original benchmarks.

2.0 Running the SPECvirt_sc2010 Benchmark

2.1 Environment

2.1.1 Testbed Configuration

These requirements apply to all hardware and software components used in producing the benchmark result, including the System under Test (SUT), network, and clients.

 

2.1.2 System Under Test (SUT)

For a run to be valid, the following attributes must hold true:

 

2.1.3 Power

This section outlines some of the environmental and other electrical requirements related to power measurement while running the benchmark.  Note that power measurement is optional, so this section only applies to results with power in the Performance/Power categories. 

To produce a compliant result for either Performance/Power of the Total System Under Test (SPECvirt_sc2010_PPW) or Performance/Power of the Server only, (SPECvirt_sc2010_ServerPPW) the following requirements must be met in addition to the environmental and electrical requirements described in this section.

Line Voltage Source

The preferred Line Voltage source used for measurements is the main AC power as provided by local utility companies. Power generated from other sources often has unwanted harmonics which are incapable of being measured correctly by many power analyzers, and thus would generate inaccurate results.

The usage of an uninterruptible power source (UPS) as the line voltage source is allowed, but the voltage output must be a pure sine-wave. For placement of the UPS, see Power Analyzer Setup below. This usage must be specified in the Notes section of the report.

Systems that are designed to be able to run normal operations without an external source of power cannot be used to produce valid results. Some examples of disallowed systems are notebook computers, hand-held computers/communication devices and servers that are designed to frequently operate on integrated batteries without external power.

Systems with batteries intended to preserve operations during a temporary lapse of external power, or to maintain data integrity during an orderly shutdown when power is lost, can be used to produce valid benchmark results. For SUT components that have an integrated battery, the battery must be fully charged at the end of the measurement interval, or proof must be provided that it is charged at least to the level of charge at the beginning of the interval.

Note that integrated batteries that are intended to maintain such things as durable cache in a storage controller can be assumed to remain fully charged. The above paragraph is intended to address “system” batteries that can provide primary power for the SUT.

DC line voltage sources are currently not supported.
For situations in which the appropriate voltages are not provided by local utility companies (e.g. measuring a server in the United States which is configured for European markets, or measuring a server in a location where the local utility line voltage does not meet the required characteristics), an AC power source may be used, and the power source must be specified in the notes section of the disclosure report. In such situation the following requirements must be met, and the relevant measurements or power source specifications disclosed in the general notes section of the disclosure report:

Environmental Conditions

SPEC requires that power measurements be taken in an environment representative of the majority of usage environments. The intent is to discourage extreme environments that may artificially impact power consumption or performance of the server.

The following environmental conditions must be met:

Power Analyzer Setup

The power analyzer must be located between the AC Line Voltage Source and the SUT. No other active components are allowed between the AC Line Voltage Source and the SUT. If the SUT consists of several discrete parts (server and storage), separate power analyzers may be required.

Power analyzer configuration settings that are set by the SPEC PTDaemon must not be manually overridden.

Power Analyzer Specifications

To ensure comparability and repeatability of power measurements, SPEC requires the following attributes for the power measurement device used during the benchmark. Please note that a power analyzer may meet these requirements when used in some power ranges but not in others, due to the dynamic nature of power analyzer Accuracy and Crest Factor.

For example:

An analyzer with a vendor-specified uncertainty of +/- 0.5% of reading +/- 4 digits, used in a test with a maximum wattage value of 200W, would have "overall" uncertainty of (((0.5%*200W)+0.4W)=1.4W/200W) or 0.7% at 200W.

An analyzer with a wattage range 20-400W, with a vendor-specified uncertainty of +/- 0.25% of range +/- 4 digits, used in a test with a maximum wattage value of 200W, would have "overall" uncertainty of (((0.25%*400W)+0.4W)=1.4W/200W) or 0.7% at 200W.

Temperature Sensor Specifications

Temperature must be measured no more than 50mm in front of (upwind of) the main airflow inlet of the server.
To ensure comparability and repeatability of temperature measurements, SPEC requires the following attributes for the temperature measurement device used during the benchmark:

Supported and Compliant Devices

See the Device List for a list of currently supported (by the benchmark software) and compliant (in specifications) power analyzers and temperature sensors.

2.2 Workload VMs

A tile is a single unit of work that is comprised of six distinct virtual machines and supports all the component workloads.  Additional tiles are used to scale the benchmark.  The last tile may be configured as a "fractional tile", which means a "load scale factor" of less than 1.0 is applied to all of the VMs within that tile.  If used, the load scale factor must be between 0.1 and 0.9 in 0.1 increments (e.g. 0.25 would not be allowed). Each VM is required to be a distinct entity; for example, you cannot run the application server and the database on the same VM.  The following block diagram shows the tile architecture and the virtual machine/hypervisor/driver relationships:


Figure 1. Tile block diagram

Note that there are more virtual machines than client drivers; this is because the Infrastructure Server and Database Server VMs do not interact directly with the client.  Specifically, the Web Server VM must access parts of its fileset and the backend simulator (BeSim) via inter-VM communication to the Infrastructure Server.  Similarly, the Application Server VM accesses the Database Server VM via inter-VM communication.

The operating systems may vary between virtual machines within a tile.  All specific workload VMs (guest OS type and application software) across all tiles must be identical, including fractional tiles.  Examples of parameters that must remain identical include:

·       Guest OS distribution, version, and patch levels

·       Application software version and patch levels

·       Guest OS and application software tunings

·       VM resource parameters from the guest OS perspective (i.e. # CPUs, memory, networking/storage configuration)

The intent is that workload-specific VMs across tiles are "clones", with only the modifications necessary to identify them as different entities (e.g. host name and network address).

Mail Server VM

As Internet email is defined by its protocol definitions, the mail server requires adherence to the relevant protocol standards:

  RFC 2060 : Internet Mail Application Protocol - Version 1 (IMAP4)

The IMAP4 protocol implies the following:

  RFC   791 : Internet Protocol (IPv4)
  RFC   792 : Internet Control Message Protocol (ICMP)
  RFC   793 : Transmission Control Protocol (TCP)
  RFC   950 : Internet Standard Subnetting Procedure
  RFC 1122 : Requirements for Internet Hosts - Communication Layers

Internet standards are evolving standards. Adherence to related RFC's (e.g. RFC 1191 Path MTU Discovery) is also acceptable provided the implementation retains the characteristic of interoperability with other implementations.

Application Server VM

The J2EE server must provide a runtime environment that meets the requirements of the Java 2 Platform, Enterprise Edition, (J2EE) Version 1.3 or later specifications during the benchmark run.

A major new version (i.e. 1.0, 2.0, etc.) of a J2EE server must have passed the J2EE Compatibility Test Suite (CTS) by the product's general availability date.

A J2EE Server that has passed the J2EE Compatibility Test Suite (CTS) satisfies the J2EE compliance requirements for this benchmark regardless of the underlying hardware and other software used to run the benchmark on a specific configuration, provided the runtime configuration options result in behavior consistent with the J2EE specification. For example, using an option that violates J2EE argument passing semantics by enabling a pass-by-reference optimization, would not meet the J2EE compliance requirement.

Comment: The intent of this requirement is to ensure that the J2EE server is a complete implementation satisfying all requirements of the J2EE specification and to prevent any advantage gained by a server that implements only an incomplete or incompatible subset of the J2EE specification.

SPECvirt_sc2010 requires that each Application Server VM execute it own locally installed emulator application (emulator.EAR). This differs from the original SPECjAppServer2004 workload definition.

Database Server VM

All tables must have the properly scaled number of rows as defined by the database population requirements, as defined in the "Application and Database Server Benchmark" section of the SPECvirt_sc2010 Design Overview.

Additional database objects or DDL modifications made to the reference schema scripts in the schema/sql directory in the SPECjAppServer2004 Kit must be disclosed along with the specific reason for the modifications. The base tables and indexes in the reference scripts cannot be replaced or deleted. Views are not allowed. The data types of fields can be modified provided they are semantically equivalent to the standard types specified in the scripts.

Comment: Replacing CHAR with VARCHAR would be considered semantically equivalent. Changing the size of a field (for example: increasing the size of a char field from 8 to 10) would not be considered semantically equivalent. Replacing CHAR with INTEGER (for example: zip code) would not be considered semantically equivalent.

Modifications that a customer may make for compatibility with a particular database server are allowed. Changes may also be necessary to allow the benchmark to run without the database becoming a bottleneck, subject to approval by SPEC. Examples of such changes include:

Comment: Schema scripts provided by the vendors in the schema/<vendor> directories are for convenience only. They do not constitute the reference or baseline scripts in the schema/sql directory. Deviations from the scripts in the schema/sql directory must still be disclosed in the submission file even though the vendor-provided scripts were used directly.

In any committed state the primary key values must be unique within each table. For example, in the case of a horizontally partitioned table, primary key values of rows across all partitions must be unique.

The databases must be populated using the supplied load programs or restored from a database copy in a correctly populated state that was populated using the supplied load programs prior to the start of each benchmark run.

Modifications to the load programs are permitted for porting purposes. All such modifications made must be disclosed in the Submission File.

Web Server VM 

As the WWW is defined by its interoperative protocol definitions, the Web server requires adherence to the relevant protocol standards. It is expected that the Web server is HTTP 1.1 compliant. The benchmark environment shall be governed by the following standards:

For further explanation of these protocols, the following might be helpful:

The current text of all IETF RFC's may be obtained from: http://ietf.org/rfc.html

All marketed standards that a software product states as being adhered to must have passed the relevant test suites used to ensure compliance with the standards.

For a run to be valid, the following attributes must hold true:

Infrastructure VM

The Infrastructure VM has the same requirements as the Web Server VM in its role as a web back-end (BeSim) for the web workload.
It also hosts the download files for the webserver using a file system protocol for remote file sharing (for example NFS or CIFS).

Idle Server VM

For a run to be valid, each idle server VM must have at least 512 MB of memory allocated.  The operating system of the idle server VM must be of the same type and version as at least one other VM in the tile.  The idle server VM does not need to contain the other VM's workload-specific application software stack. The intent of these requirements is to prohibit vendors from artificially limiting and tuning in order to take advantage of the idle server's limited functionality.

2.3 Measurement

2.3.1 Quality of Service

The SPECvirt_sc2010 individual workload metrics represent the aggregate throughput that a server can support while meeting quality of service (QoS) and validation requirements.  In the benchmark run, one or more tiles are run simultaneously. The load generated is based on page requests, database transactions, and IMAP operations as defined in the SPECvirt_sc2010 Design Overview.

The QoS requirements are relative to the individual workloads. These include:

The load generated is based on page requests, transition between pages and the static images accessed within each page.

The QoS requirements are defined in terms of two parameters, Time_Good and Time_Tolerable. QoS requirements are page based, Time_Good and Time_Tolerable values are defined as 3 seconds and 5 seconds respectively. For each page, 95% of the page requests (including all the embedded files within that page) are expected to be returned within Time_Good and 99% of the requests within Time_Tolerable.  Very large static files (i.e. Support downloads) use specific byte rates as their QoS requirements.

The validation requirement is such that less than 1% of requests for any given page and less than 0.5% of all the page requests in a given test iteration fail validation.

It is required in this benchmark that all user sessions be run at the "high-speed Internet" speed of 100 kilobytes/sec.

In addition, the URL retrievals (or operations) performed must also meet the following quality criteria:

 

For each IMAP operation type, 95% of all transactions must complete within five seconds. Additionally for each IMAP operation type, there may be no more than 1.5% failures (where a failure is defined as transactions that return unexpected content, or time-out). The total failure count across all operation types must be no more than 1% of the count of all operations.

The client polls the Idle Server periodically to ensure that the VM is running and responsive. To meet the Idle Server QoS requirement, 99.5% of all polling requests must be responded to within one second.

Driver Requirements for the Dealer Domain

Business Transaction Mix Requirements

Business Transactions are selected by the Driver based on the mix shown in the following table. The actual mix achieved in the benchmark must be within 5% of the targeted mix for each type of Business Transaction. For example, the browse transactions can vary between 47.5% to 52.5% of the total mix. The Driver checks and reports on whether the mix requirement was met.

Business Transaction Mix Requirements

Business Transaction Type

Percent Mix

Purchase

25%

Manage

25%

Browse

50%

Response Time Requirements

The Driver measures and records the Response Time of the different types of Business Transactions. Only successfully completed Business Transactions in the Measurement Interval are included. At least 90% of the Business Transactions of each type must have a Response Time of less than the constraint specified in the table below. The average Response Time of each Business Transaction's type must not be greater than 0.1 seconds more than the 90% Response Time. This requirement ensures that all users will see reasonable response times. For example, if the 90% Response Time of purchase transactions is 1 second, then the average cannot be greater than 1.1 seconds. The Driver checks and reports on whether the response time requirements were met.

Response Time Requirements

Business Transaction Type

90% RT (in seconds)

Purchase

2

Manage

2

Browse

2

Cycle Time Requirements

For each Business Transaction, the Driver selects cycle times from a negative exponential distribution, computed from the following equation:

Tc = -ln(x) * 10

where:

Tc = Cycle Time
 ln = natural log (base e)
 x  = random number with at least 31 bits of precision,
      from a uniform distribution such that (0 < x <= 1)

The distribution is truncated at 5 times the mean. For each Business Transaction, the Driver measures the Response Time Tr and computes the Delay Time Td as Td = Tc - Tr. If Td > 0, the Driver will sleep for this time before beginning the next Business Transaction. If the chosen cycle time Tc is smaller than Tr, then the actual cycle time (Ta) is larger than the chosen one.

The average actual cycle time is allowed to deviate from the targeted one by 5%. The Driver checks and reports on whether the cycle time requirements were met.

Miscellaneous Requirements

The table below shows the range of values allowed for various quantities in the application. The Driver will check and report on whether these requirements were met.

Miscellaneous Dealer Requirements

Quantity

Targeted Value

Min. Allowed

Max. Allowed

Average Vehicles per Order

26.6

25.27

27.93

Vehicle Purchasing Rate (/sec)

6.65 * Ir

6.32* Ir

6.98 * Ir

Percent Purchases that are Large Orders

10

9.5

10.5

Large Order Vehicle Purchasing Rate (/sec)

3.5 * Ir

3.33 * Ir

3.68 * Ir

Average # of Vehicles per Large Order

140

133

147

Regular Order Vehicle Purchasing Rate (/sec)

3.15 * Ir

2.99 * Ir

3.31 * Ir

Average # of Vehicles per Regular Order

14

13.3

14.7

Performance Metric

The metric for the Dealer Domain is Dealer Transactions/sec, composed of the total count of all Business Transactions successfully completed during the measurement interval divided by the length of the measurement interval in seconds.

Driver Requirements for the Manufacturing Domain

Response Time Requirements

The M_Driver measures and records the time taken for a work order to complete. Only successfully completed work orders in the Measurement Interval are included. At least 90% of the work orders must have a Response Time of less than 5 seconds. The average Response Time must not be greater than 0.1 seconds more than the 90% Response Time.

Miscellaneous Requirements

The table below shows the range of values allowed for various quantities in the Manufacturing Application. The M_Driver will check and report on whether the run meets these requirements.

Miscellaneous Manufacturing Requirements

Quantity

Targeted Value

Min. Allowed

Max. Allowed

LargeOrderline Widget Rate/sec

3.5 * Ir

3.15 * Ir

3.85 * Ir

Planned Line Widget Rate/sec

3.15 * Ir

2.835 * Ir

3.465 * Ir

2.3.2 Benchmark Parameters

Workload-specific configuration files are supplied with the harness. All configurable parameters are listed in these files. For a run to be valid, all the parameters in the configuration files must be left at default values, except for the ones that are marked and listed clearly as "Configurable Workload Properties".

2.3.3 Running SPECvirt_sc2010 Workloads

To configure the initial benchmark environment from scratch, the benchmarker:

To run the benchmark, the benchmarker must:

2.3.4 Power Measurement

NOTE: This section is only applicable to results that have power measurement, which is optional.

The measurement of power should meet all the environmental aspects listed in Environmental Conditions. The SPECvirt_sc2010 benchmark tools provide the ability to automatically gather measurement data from supported power analyzers and temperature sensors and integrate that data into the benchmark result. SPEC requires that the analyzers and sensors used in a submission be supported by the measurement framework. The provided tools (or a newer version provided by SPEC) must be used to run and produce measured SPECvirt_sc2010 results.

The primary metrics, SPECvirt_sc2010_PPW (performance with SUT power) and SPECvirt_sc2010_ServerPPW  (performance with Server only power) are performance per watt metrics obtained by dividing the peak performance by the peak power of the SUT or Server, respectively, during the run measurement phase.  For example, if the SPECvirt_sc2010 result consisted of a maximum of 6 tiles, the power would be calculated as the average power while serving transactions within all 6 workload tiles.

2.3.5 Client Polling Requirements

During the measurement phase, the SPECvirt_sc2010 prime controller polls each prime client process associated with each workload in each tile once every 10 seconds. The prime controller collects and records the workload polling data which includes performance and QoS measurement data from the clients.  It is expected that in a compliant run all polling requests will be responded to within 10 seconds (BEAT_INTERVAL). Failure to respond to polling requests may indicate problems with the clients' ability to issue and respond to workload requests in a timely manner or accurately record performance.

The prime controller process will detect that each polling request is responded to by the prime client processes, the prime controller will invalidate the test if more than one 10-second polling interval is missed during the test's measurement phase.  The test will abort, and the run will be marked as non-compliant.

3.0 Reporting Results

3.1 Metrics And Reference Format

The reported performance metric, SPECvirt_sc2010, appears in both Performance/Power and Performance categories, and will be derived from a set of compliant results from the workloads in the suite:

The SPECvirt_sc2010 metric is a "supermetric" that is the arithmetic mean of the normalized submetrics for each workload.  The metric will be output in the format "SPECvirt_sc2010 <score> @ <# vms> VMs".

The optional reported performance/watt metrics, SPECvirt_sc2010_PPW and SPECvirt_sc2010_ServerPPW, represents the peak performance divided by the average power of the SUT and server respectively during the peak run phase.  These metrics will only appear in results in the Performance/Power categories, and the result must not be compared with results that do not have power measured.  These metrics will be output in the format "SPECvirt_sc2010_PPW <score> @ <# vms> VMs" and "SPECvirt_sc2010_ServerPPW <score> @ <# vms> VMs"

 

Please consult the SPEC Fair Use Rule on the treatment of estimates at http://www.spec.org/fairuse.html#SPECvirt_sc2010.

The report of results for the SPECvirt_sc2010 benchmark is generated in ASCII and HTML format by the provided SPEC tools. These tools may not be changed without prior SPEC approval. The tools perform error checking and will flag some error conditions as resulting in an "invalid run".  However, these automatic checks are only there for debugging convenience, and do not relieve the benchmarker of the responsibility to check the results and follow the run and reporting rules.

SPEC reviews and accepts for publication on SPEC's website only a complete and compliant set of results run and reported according to these rules.  Full disclosure reports of all test and configuration details as described in these run and report rules must be made available.  Licensees are encouraged to submit results to SPEC for publication.

3.2 Testbed Configuration

All system configuration information required to duplicate published performance results must be reported. Tunings not in the default configuration for software and hardware settings must be reported. All tiles must be tuned identically.

3.2.1 SUT Hardware

The following SUT hardware components must be reported:

3.2.1.1 SUT Stable Storage

The SUT must utilize stable storage. Additionally, the SUT must use stable and durable storage for all virtual machines (including all corresponding data drives), such that a single drive failure does not incur data loss on the VMs. For example: RAID-1, 5, 10, 50, 0+1 are acceptable RAID levels, but RAID-0 (striping without mirroring or parity) is not considered durable.

The SUT

The hypervisor must be able to recover the virtual machines, and the virtual machines must also be able to recover their data sets, without loss from multiple power failures (including cascading power failures), hypervisor and guest operating system failures, and hardware failures of components (e.g. CPU) other than the storage medium. At any point where the data can be cached, after any virtual server has accepted the message and acknowledged a transaction, there must be a mechanism to ensure any cached data survives the server failure.

If an UPS is required by the SUT to meet the stable storage requirement, the benchmarker is not required to perform the test with an UPS in place.  The benchmarker must state in the disclosure that an UPS is required. Supplying a model number for an appropriate UPS is encouraged but not required.

If a battery-backed component is used to meet the stable storage requirement, that battery must have sufficient power to maintain the data for at least 48 hours to allow any cached data to be committed to media and the system to be gracefully shut down. The system or component must also be able to detect a low battery condition and prevent the use of the caching feature of the component or provide for a graceful system shutdown.

Hypervisors are required to safely store all completed transactions to its virtualized workloads (including failure of the hypervisor's own storage):

3.2.2 SUT Software

The following SUT software components must be reported:

3.2.3 Network Configuration

A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:

3.2.4 Clients

The following load generator hardware components must be reported:

3.2.5 General Availability Dates

The dates of general customer availability must be listed for the major components: hardware, software (hypervisor, operating systems, and applications), month and year. All the system, hardware and software features are required to be generally available on or before date of publication, or within 3 months of the date of publication (except where precluded by these rules, see section 3.2.7). With multiple components having different availability dates, the latest availability date must be listed.

Products are considered generally available if they are orderable by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification, and common practice. The availability of support and documentation for the products must coincide with the release of the products.

Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in clients.

For ease and cost of benchmarking, storage and networking hardware external to the server such as disks, storage enclosures, storage controllers and network switches, which were generally available within the last five years but are no longer available from the original vendor, may be used. If such end-of-life (and possibly unsupported) hardware is used, then the test sponsor represents that the performance measured is no better than 105% of the performance on hardware available as of the date of publication. The product(s) and their end-of-life date(s) must be noted in the disclosure. If it is later determined that the performance using available hardware to be lower than 95% of that reported, the result shall be marked non-compliant (NC).

Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years.

In the disclosure, the benchmarker must identify any component that is no longer orderable by ordinary customers.

If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the release system. If it is later determined that the performance using available hardware or software to be lower than 95% of that reported, the result shall be marked non-compliant (NC).

3.2.6 Rules on the Use of Open Source Applications

SPECvirt_sc2010 does permit Open Source Applications outside of a commercial distribution or support contract with some limitations. The following are the rules that govern the admissibility of the Open Source Application in the context of a benchmark run or implementation. Open Source Applications do not include shareware and freeware, where the source is not part of the distribution.

  1. Open Source Application rules do not apply to Open Source operating systems, which would still require a commercial distribution and support.
     
  2. Only a "stable" release can be used in the benchmark environment; non-"stable" releases (alpha, beta, or release candidates) cannot be  used.

    Reason: An open source project is not contractually bound and volunteer resources make predictable future release dates unlikely (i.e. may be more likely to miss SPEC's 3 month General Availability window).  A "stable" release is one that is clearly denoted as a stable release or a release that is available and recommended for general use. It must be a release that is not on the development fork, not designated as an alpha, beta, test, preliminary, pre-released, prototype, release-candidate, or any other terms that indicate that it may not be suitable for general use.
     
  3. The initial "stable" release of the application must be a minimum of 12 months old.

    Reason: This helps ensure that the software has real application to the intended user base and is not a benchmark special that's put out with a benchmark result and only available for the 1st three months to meet SPEC's forward availability window.
     
  4. At least two additional stable releases (major, minor, or bug fix) must have been completed, announced and shipped beyond the initial stable release.

    Reason: This helps establish a track record for the project and shows that it is actively maintained.
     
  5. An established online support forum must be in place and clearly active, "usable", and "useful".  It’s required that there be at least one posting within the last 3 months.  Postings from the benchmarkers or their representatives, or members of the Virtualization Subcommittee will not be included in the count.

    Reason: Another aspect that establishes that support is available for the software.  However, benchmarkers must not cause the forum to appear active when it otherwise would not be. A "useful" support forum is defined as one that provides useful responses to users’ questions, such that if a previously unreported problem is reported with sufficient detail, it is responded to by a project developer or community member with sufficient information that the user ends up with a solution, a workaround, or has been notified that the issue will be address in a future release, or that its outside the scope of the project.  The archive of the problem-reporting tool must have examples of this level of conversation. A "usable" support forum is defined as one where the problem reporting tool was available without restriction, had a simple user-interface, and users can access old reports.
     
  6. The project must have at least 2 identified developers contributing and maintaining the application.

    Reason: To help ensure that this is a real application with real developers and not a fly-by-night benchmark special.
     
  7. The application must use a standard open source license such as one of those listed at http://www.opensource.org/licenses/.
     
  8. The "stable" release used in the actual test run must have been a latest "stable" release within the prior six months at the time the result is submitted for review. The exact beginning of this time window has to be determined starting from the date of the submission then going back 6 months and keeping the day number the same. Note: Residual cases are treated as described as in http://www/spec.org/osg/policy.html#s2.3.4 substituting the 6 month window for 3 month availability window. Examples:

Submission date

Beginning of time window

Aug 20, 2019

Feb 20, 2019

Jul 20, 2019

Jan 20, 2019

Jun 20, 2019

Dec 20 2018


  1. Reason: Benchmarkers should keep up to date with the recent releases; however they are not required to move to a release that would be fewer than six months old at the time of their submission.

    Please note, an Open Source Application project may support several parallel development branches and so there may be multiple latest stable releases that meet these rules. For example, a project may have releases such as 10.0, 9.5.1, 8.3.12, and 8.2.29 that are all current supported and stable releases.
     
  2. The "stable" release used in the actual test run must be no older than 18 months.  If there has not been a "stable" release within 18 months, then the open source project may no longer be active and as such may no longer meet these requirements.  An exception may be made for “mature” projects (see below).
     
  3. In rare cases, open source projects may reach “maturity” where the software requires little or no maintenance and there may no longer be active development.  If it can be demonstrated that the software is still in general use and recommended either by commercial organizations or active open source projects or user forums and the source code for the software is fewer than 20,000 lines, then a request can be made to the subcommittee to grant this software “mature” status.  In general, it is expected that the final stable release for the "mature" project continues to work "as is" for the majority of users but that over time some users may need to make portability changes. This status may be reviewed semi-annually.  The current list of  projects granted "mature" status by the subcommittee include: the FastCGI library and Alternate PHP Cache.


Note: The Webserver workload requires the use of Smarty 2.6.26 which is included in the release kit and is not subject to the above rules.

3.2.7 Test Sponsor

The reporting page must list the date the test was performed, month and year, the organization which performed the test and is reporting the results, and the SPEC license number of that organization.

3.2.8 Notes

This section is used to document:

4.0 Submission Requirements for SPECvirt_sc2010

Once you have a compliant run and wish to submit it to SPEC for review, you will need to provide the following:

Once you have the submission ready, please email SPECvirt_sc2010 submissions to subvirt_sc2010@spec.org.

In order to publicly disclose SPECvirt_sc2010 results, the submitter must adhere to these reporting rules in addition to having followed the run rules described in this document. The goal of the reporting rules is to ensure the system under test is sufficiently documented such that someone could reproduce the test and its results.

Compliant runs need to be submitted to SPEC for review and must be accepted prior to public disclosure. If  public statements using SPECvirt_sc2010 are made they must follow the SPEC Fair Use Rule (http://www.spec.org/fairuse.html).

Many other SPEC benchmarks allow duplicate submissions for a single system sold under various names. Each SPECvirt_sc2010 result from a power enabled run submitted to SPEC or made public must be for an actual run of the benchmark on the SUT named in the result. Electrically equivalent submissions for power enabled runs are not allowed, unless it is also mechanically equivalent (e.g. rebadged).

4.1 SUT Configuration Collection

The submitter is required to run a script that will collect available configuration details of the SUT and all the virtual machines used for the benchmark, including:

The primary reason for this step is to ensure that there are not subtle differences that the vendor may miss. 

4.2 Guest Configuration Collection

The submitter is required to run  a script which provides the details of each VM, its operating system and application tunings that is not captured in the SUT configuration collection script including:

During a review of the result, the submitter may be required to provide, upon request,  additional details of the VM, operating system and application tunings and log files that may not be captured in the above script. These may include, but are not limited to:

The primary reason for this step is to ensure that the vendor has disclosed all non-default tunings. 

4.3 Client Configuration Collection

The submitter is required to run a script which collects the details of each type or uniquely configured physical and virtual client used, such that the testbed's client configuration could be reproduced.  Clients that are clones of a specific and documented type may be identified and data collection is encouraged but not required.  The client collection script should collect  files and output of commands to document the client configuration and tuning details including:

During a review of the result, the submitter may be required to provide, upon request,  additional details of the client configuration  that may not be captured in the above script to help document details relevant to questions that may arise during the review.

 4.4 Configuration Collection Archive Format

The submitter must submit the Configuration Collection Archive containing the data (files and command ouput) described sections 4.0, 4.1.and 4.2 above using the highlevel directory structure described below as the foundation:

 

5.0 The SPECvirt_sc2010 Benchmark Kit

SPEC provides client driver software, which includes tools for running the benchmark and reporting its results.  The client drivers are written in Java; precompiled class files are included with the kit, so no build step is necessary. Recompilation of the client driver software is not allowed, unless prior approval from SPEC is given.

This software implements various checks for conformance with these run and reporting rules; therefore, the SPEC software must be used as provided. Source code modifications are not allowed, unless prior approval from SPEC is given. Any such substitution must be reviewed and deemed "performance-neutral" by the OSSC.

The kit also includes source code for the file set generators, script code for the web server, and other necessary components.

Appendix A. Run Rules References

SPECvirt_sc2010 uses modified versions of SPECweb2005, SPECjAppServer2004, and SPECmail2008 for its virtualized workloads. For reference, the run rules for those benchmarks are listed below:

NOTE: Not all of these run rules are applicable to SPECvirt, but when a compliance issue is raised, SPEC reserves the right to refer back to these individual benchmarks' run rules as needed for clarification.


Copyright © 2011 Standard Performance Evaluation Corporation.  All rights reserved.

Java® is a registered trademark of Oracle Corporation.