Decoupled Load Balancing Conference Paper uri icon

abstract

  • Modern scientific simulations divide work between parallel processors by decomposing a spatial domain of mesh cells, particles, or other elements. A balanced assignment of the computational load is critical for parallel performance. If the computation per element changes over the simulation time, simulations can use dynamic load balance algorithms to evenly redistribute work to processes. Graph partitioners are widely used and balance very effectively, but they do not strong scale well. Typical SPMD simulations wait while a load balance algorithm runs on all processors, so a poorly scaling algorithm can itself become a bottleneck. We observe that the load balance algorithm is separate from the main application computation and has its own scaling properties. We propose to decouple the load balance algorithm from the application, and to offload the load balance computation so that it runs concurrently with the application on a smaller number of processors. We demonstrate the costs of decoupling and offloading the load balancing algorithm from a Barnes-Hut application.

name of conference

  • Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

published proceedings

  • ACM SIGPLAN NOTICES

author list (cited authors)

  • Pearce, O., Gamblin, T., de Supinski, B. R., Schulz, M., & Amato, N. M.

citation count

  • 1

complete list of authors

  • Pearce, Olga||Gamblin, Todd||de Supinski, Bronis R||Schulz, Martin||Amato, Nancy M

publication date

  • January 2015