A Decoupled KILO-Instruction Processor
Conference Paper
Overview
Identity
Additional Document Info
Other
View All
Overview
abstract
Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are composed of structures that do not scale to large instruction windows because of timing and power constraints. However, the behavior of programs executed with large instruction windows gives rise to a natural and simple alternative to scaling. We characterize this phenomenon of execution locality and propose a microarchitecture to exploit it to achieve the benefit of a large instruction window processor with low implementation cost. Execution locality is the tendency of instructions to exhibit high or low latency based on their dependence on memory operations. In this paper we propose a decoupled microarchitecture that executes low latency instructions on a Cache Processor and high latency instructions on a Memory Processor. We demonstrate that such a design, using small structures and many in-order components, can achieve the same performance as much more aggressive proposals while minimizing design complexity. 2006 IEEE.
name of conference
The Twelfth International Symposium on High-Performance Computer Architecture, 2006.