Architectural support for parallel reductions in scalable shared-memory multiprocessors Conference Paper uri icon

abstract

  • Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applications. Unfortunately, conventional reduction parallelization algorithms are not scalable. In this paper, we present new architectural support that significantly speeds-up parallel reduction and makes it scalable in shared-memory multiprocessors. The required architectural changes are mostly confined to the directory controllers. Experimental results based on simulations show that the proposed support is very effective. While conventional software-only reduction parallelization delivers average speedups of only 2.7 for 16 processors, our scheme delivers average speedups of 7.6.

name of conference

  • Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques

published proceedings

  • 2001 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS

author list (cited authors)

  • Garzaran, M. J., Prvulovic, M., Zhang, Y., Jula, A., Yu, H., Rauchwerger, L., & Torrellas, J.

citation count

  • 11

complete list of authors

  • Garzaran, MJ||Prvulovic, M||Zhang, Y||Jula, A||Yu, H||Rauchwerger, L||Torrellas, J

publication date

  • January 2001