Girdhar, Nimish (2016-08). Predicting Thread Criticality and Frequency Scalability for Active Load Balancing in Shared Memory Multithreaded Applications. Master's Thesis. Thesis uri icon


  • With advancements in process technologies, manufacturers are able to pack many processor cores on a single chip. Consequently, there is a paradigm shift from single processor to single chip multiprocessors (CMP). As mobile industry is moving towards CMPs, the challenge of working in a low power/energy budget while still delivering good performance, has taken the precedence. Therefore, energy and power optimization has become a primary design constraint along with performance in CMP design space. CMPs get performance benefit from running multithreaded programs which utilizes Thread Level Parallelism (TLP). But these parallel programs have inherent load imbalance between threads due to synchronization which degrades performance as well as energy consumption. The solution is to develop energy efficient algorithms which can detect and reduce this imbalance to maximize performance while still giving energy savings. Dynamic thread criticality prediction is an approach to detect the load imbalance by identifying the critical threads which cause performance/energy degradation due to synchronization. Cores running critical threads can then be run on high power states using Dynamic Voltage Frequency Scaling, thus balancing the execution times of all thread. But in order to create an accurate balance and utilize DVFS efficiently, one also needs to know the impact of voltage/frequency scaling on thread's performance gain and power usage. Thread frequency scalability prediction can give a good estimate of the performance improvements that DVFS can provide to each thread. We present an active load balancing algorithm which uses thread criticality and frequency scalability prediction to get the maximum possible performance benefit in an energy efficient manner. Results show that balancing the load in an accurate way can give energy savings as high as 10% with minimal performance loss as compared to running all cores at high frequency.

publication date

  • August 2016