Studying microarchitectural structures with object code reordering

Modern microprocessors have many microarchitectural features. Quantifying the performance impact of one feature such as dynamic branch prediction can be difficult. On one hand, a timing simulator can predict the difference in performance given two different implementations of the technique, but simulators can be quite inaccurate. On the other hand, real systems are very accurate representations of themselves, but often cannot be modified to study the impact of a new technique. We demonstrate how to develop a performance model for branch prediction using real systems based on object code reordering. By observing the behavior of the benchmarks over a range of branch prediction accuracies, we can estimate the impact of a new branch predictor by simulating only the predictor and not the rest of the microarchitecture. We also use the reordered object code to validate a reverse-engineered model for the Intel Core 2 branch predictor. We simulate several branch predictors using Pin and measure which hypothetical branch predictor has the highest correlation with the real one. This study in object code reorder points to way to future work on estimating the impact of other structures such as the instruction cache, the second-level cache, instruction decoders, indirect branch prediction, etc. 2010 ACM.

Studying microarchitectural structures with object code reordering Conference Paper