Insertion and promotion for tree-based PseudoLRU last-level caches
- Additional Document Info
- View All
Last-level caches mitigate the high latency of main memory. A good cache replacement policy enables high performance for memory intensive programs. To be useful to industry, a cache replacement policy must deliver high performance without high complexity or cost. For instance, the costly least-recently-used (LRU) replacement policy is not used in highly associative caches; rather, inexpensive policies with similar performance such as PseudoLRU are used. We propose a novel last-level cache replacement algorithm with approximately the same complexity and storage requirements as tree-based PseudoLRU, but with performance matching state of the art techniques such as dynamic re-reference interval prediction (DRRIP) and protecting distance policy (PDP). The algorithm is based on PseudoLRU, but uses set-dueling to dynamically adapt its insertion and promotion policy. It has slightly less than one bit of overhead per cache block, compared with two or more bits per cache block for competing policies. In this paper, we give the motivation behind the algorithm in the context of LRU with improved placement and promotion, then develop this motivation into a PseudoLRU-based algorithm, and finally give a version using set-dueling to allow adaptivity to changing program behavior. We show that, with a 16-way set-associative 4MB last-level cache, our adaptive PseudoLRU insertion and promotion algorithm yields a geometric mean speedup of 5.6% over true LRU over all the SPEC CPU 2006 benchmarks using far less overhead than LRU or other algorithms. On a memory-intensive subset of SPEC, the technique gives a geometric mean speedup of 15.6%. We show that the performance is comparable to state-of-the-art replacement policies that consume more than twice the area of our technique. © 2013 ACM.
author list (cited authors)