Panda, Reena (2011-12). A Branch-Directed Data Cache Prefetching Technique for Inorder Processors. Master's Thesis. Thesis uri icon

abstract

  • The increasing gap between processor and main memory speeds has become a serious bottleneck towards further improvement in system performance. Data prefetching techniques have been proposed to hide the performance impact of such long memory latencies. But most of the currently proposed data prefetchers predict future memory accesses based on current memory misses. This limits the opportunity that can be exploited to guide prefetching. In this thesis, we propose a branch-directed data prefetcher that uses the high prediction accuracies of current-generation branch predictors to predict a future basic block trace that the program will execute and issues prefetches for all the identified memory instructions contained therein. We also propose a novel technique to generate prefetch addresses by exploiting the correlation between the addresses generated by memory instructions and the values of the corresponding source registers at prior branch instances. We evaluate the impact of our prefetcher by using a cycle-accurate simulation of an inorder processor on the M5 simulator. The results of the evaluation show that the branch-directed prefetcher improves the performance on a set of 18 SPEC CPU2006 benchmarks by an average of 38.789% over a no-prefetching implementation and 2.148% over a system that employs a Spatial Memory Streaming prefetcher.
  • The increasing gap between processor and main memory speeds has become a serious
    bottleneck towards further improvement in system performance. Data prefetching
    techniques have been proposed to hide the performance impact of such long memory
    latencies. But most of the currently proposed data prefetchers predict future memory
    accesses based on current memory misses. This limits the opportunity that can be
    exploited to guide prefetching.
    In this thesis, we propose a branch-directed data prefetcher that uses the high prediction
    accuracies of current-generation branch predictors to predict a future basic block trace
    that the program will execute and issues prefetches for all the identified memory
    instructions contained therein. We also propose a novel technique to generate prefetch
    addresses by exploiting the correlation between the addresses generated by memory
    instructions and the values of the corresponding source registers at prior branch
    instances. We evaluate the impact of our prefetcher by using a cycle-accurate simulation
    of an inorder processor on the M5 simulator. The results of the evaluation show that the
    branch-directed prefetcher improves the performance on a set of 18 SPEC CPU2006
    benchmarks by an average of 38.789% over a no-prefetching implementation and
    2.148% over a system that employs a Spatial Memory Streaming prefetcher.

publication date

  • December 2011