|
An Example of Benchmark Obsolescence: 023.eqntottOne of the Reasons Why We Need to Update Our Benchmarks
by Reinhold Weicker Published December, 1995; see disclaimer. Observers of the SPEC CPU benchmarks probably have noticed that in the transition from SPEC92 to SPEC95, some old benchmarks were carried over to the new suite whereas some other benchmarks were not. (Even those that were carried over have been given new, larger input data sets and, consequently, new identifying numbers.) What reasoning was used in this decision process? Benchmark selection within SPEC is a long process, and there is undoubtedly and unavoidably some subjectivity in the judgment about the merits of the benchmark candidates. This is particularly true of benchmark programs that have appeared in previous versions of the benchmark suite. However, there is often also agreement that a particular benchmark has become obsolete or extremely susceptible to new technologies and techniques; in this case it will not be included in the new suite. Benchmark 023.eqntott, which was carried over from the '89 suite into the '92 suite but is no longer in SPEC95, is such a case. Its problems are related to the fact that this benchmark spends between 79 percent and 85 percent of its execution time (older measurements on Unisys and MIPS systems -- percentages may vary on other systems) in a very small subroutine, and that this subroutine has some peculiarities. Here is the code of this subroutine, function cmppt() in module pterm_ops.c: /* from compilation unit pterm_ops.c: */ /* 01 */ #include x.h /* 02 */ extern int ninputs, noutputs; /* 03 */ /* 04 */ int cmppt (a, b) /* 05 */ PTERM *a[], *b[]; /* 06 */ { /* 07 */ register int i, aa, bb; /* 08 */ for (i = 0; i < ninputs; i++) { /* 09 */ aa = a[0]->ptand[i]; /* 10 */ bb = b[0]->ptand[i]; /* 11 */ if (aa == 2) /* 12 */ aa = 0; /* 13 */ if (bb == 2) /* 14 */ bb = 0; /* 15 */ if (aa != bb) { /* 16 */ if (aa < bb) { /* 17 */ return (-1); /* 18 */ } /* 19 */ else { /* 20 */ return (1); /* 21 */ } /* 22 */ } /* 23 */ } /* 24 */ return (0); /* 25 */ } /* relevant parts of the header file x.h: */ typedef short BIT; typedef struct Pterm { BIT *ptand; /* AND-plane connections */ BIT *ptor; /* OR-plane connections */ struct Pterm *next; /* link to next product term */ long andhash; /* hash of input connection values */ short index; /* number of 1's in ptand */ short cv; /* "covered" flag */ } PTERM; An initial analysis of this routine reveals:
When the benchmark was adopted for the SPEC92 suite, its high code locality was known but it appeared that no optimization would be found that could trivialize it. At the time the benchmark was introduced (1989) optimizing this code seemed to be a difficult task. However, if a program becomes a benchmark, compiler authors can become very creative. The following list (not exhaustive) includes some of the optimizations that have been employed:
The run rules state that "Use of software features (in preprocessors, compilers, etc.) which invoke, generate or use software designed specifically for any of the SPEC benchmark releases is not allowed." One cannot say that, by itself, some of the optimizations discussed above are benchmark-specific and some are not. Some observers might try to draw conclusions from the SPECratios of the individual benchmarks ("Benchmark 023.eqntott has, for this system, an unusually high SPECratio, compared with other benchmarks"). Or they might even look at the generated assembly code ("This code sequence looks like hand-optimized code"). While such observations can hint at problematic practices, they are not a proof that something incorrect has been done. They can still result from legal and generally useful optimizations that are encouraged by SPEC. Rather, the following criteria can be used:
These issues have been discussed within SPEC. There have been differences in opinion whether SPEC as a group should make a judgement on the legality of any particular transformation or optimization. There is, however, unanimous agreement that it is undesirable if benchmarks, by high code locality or by other features, create incentives for code transformations or optimizations that accelerate a benchmark much more than other programs. It is true that real-world programs do contain "hot spots", small program parts where the program spends a large part of its time. The lesson that SPEC has learned is that such programs, however useful they may be otherwise, are just not suitable as benchmark programs. The other lesson is that it makes sense to change benchmarks from time to time, thus decreasing the incentive for special-case optimizations that do not benefit other programs. The programs of the new benchmark suite SPEC95 have been selected with this experience in mind. They are not perfect but certainly better than the old ones. In addition, the run rules have been extended to state more clearly which code transformations are allowed and which are not allowed. SPEC also has explicitly reserved the option to drop a benchmark from the suite if this benchmark has been compromised. If the new and better benchmarks of SPEC95 had not been on the horizon, SPEC might have done this for benchmark 023.eqntott. Now that the new benchmarks are available, SPEC encourages everyone to move over to the new CPU benchmarks as soon as possible. Reinhold Weicker is the SPEC Representative for Siemens Nixdorf and the Vice Chairman of the SPEC Open Systems Steering Committee. |