Cost Performance Optimization of Microprocessors

Steve Fu

Computer Systems Laboratory
Department of Electrical Engineering
Stanford University

Wednesday, January 15th, 1997

ABSTRACT

While 25 years represent only a single generation of human history, the same period of time has seen roughly 10 generations of microprocessor. We have gone from the Intel 4004( 1971, 2300 transistors, 12 mm2 die size, 750 KHz clock speed) to Digital 21164( 1996, 9.3 million transistors, 209 mm2 die size, 500 MHz clock speed). If the same trend continues, it is immediately obvious that a microprocessors in the year 2006 can have transistor counts exceeding one billion, die size exceeding 500 mm2, and clock speed in the GHz range.

Of course, these technology advances do not come without cost. The capital cost per 5000 wafer starts per week has gone from less than 10 million dollars to over one billion dollars. At the same time, microprocessor development teams have gone from 10 to over 500 engineers.

This research addresses the obvious question of how to use the available integration in a cost effective way by addressing three areas of increasing importance: floating point unit, on-chip storage hierarchy, and on-chip interconnect design. In this talk, we begin with a presentation of Floating Point Unit Cost Performance Analysis Metric(FUPA), which incorporates four key aspects of VLSI system design: latency, die area, technology, and profile of applications. FUPA utilizes technology projections based on scalable device models in order to identify the design/technology compatibility and allows designers to make high level tradeoffs in optimizing the designs.

We continue by describing CacheOpt, a tool that addresses an area of primary importance to processor performance-- memory hierarchy design. CacheOpt automates the synthesis of an optimal cache hierarchy under area, I/O, and latency constraints with respect to a given technology and application batch. We show results from applying CacheOpt to both Spec92 and multimedia benchmarks.

As we approach deep submicron designs, interconnects play an increasingly important role in both performance and cost. Without modifications at the architecture, circuits, process technology levels, interconnects will consume greater proportion of both the cycle time and the die area. we present a systematic approach to attack this challenging problem.

contact information:
fu@umnhum.stanford.edu
http://arith.stanford.edu/~fu