A Decoupled KILO-Instruction Processor Conference Paper uri icon


  • Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are composed of structures that do not scale to large instruction windows because of timing and power constraints. However, the behavior of programs executed with large instruction windows gives rise to a natural and simple alternative to scaling. We characterize this phenomenon of execution locality and propose a microarchitecture to exploit it to achieve the benefit of a large instruction window processor with low implementation cost. Execution locality is the tendency of instructions to exhibit high or low latency based on their dependence on memory operations. In this paper we propose a decoupled microarchitecture that executes low latency instructions on a Cache Processor and high latency instructions on a Memory Processor. We demonstrate that such a design, using small structures and many in-order components, can achieve the same performance as much more aggressive proposals while minimizing design complexity. © 2006 IEEE.

author list (cited authors)

  • Pericàs, M., Cristal, A., González, R., Jiménez, D. A., & Valero, M.

citation count

  • 19

publication date

  • January 2006