Andrew Zimmerman 7 October 1998 Cache Organizations for On-Chip Multiprocessors As feature sizes become smaller and chip area increases, it is becoming increasingly feasible to integrate multiple processors onto a single chip. The additional fixed area constraint of a single chip multiprocessor imposes a number of design choices and tradeoffs on the implementation. This talk presents results of studies performed to investigate some of these tradeoffs. The foremost tradeoff is the balancing of the computational performance of multiple processors against the performance of the memory system hierarchy. Results are presented that show for low latency memory hierarchies, dedicating chip area to additional computational elements, as opposed to larger caches, results in increased system performance. Based on these results, the cache organization of a dual processor cluster based multiprocessor are presented. The results show that both shared caches and shared split caches have the potential to further increase system performance over the base case of separate nonshared caches. Shared split caches realize their performance increase by reducing contention to the shared cache and eliminating contention between the private data reference streams of individual processors. Shared caches also increase system performance, but must incorporate additional features to achieve this performance increase. Contention for a small number of sets in the shared cache requires that address tinting or set associativity be used to achieve the higher performance.