Huang, Tao (2007-05). Generic implementations of parallel prefix sums and its applications. Master's Thesis. Thesis uri icon

abstract

  • Parallel prefix sums algorithms are one of the simplest and most useful building blocks for constructing parallel algorithms. A generic implementation is valuable because of the wide range of applications for this method. This thesis presents a generic C++ implementation of parallel prefix sums. The implementation applies two separate parallel prefix sums algorithms: a recursive doubling (RD) algorithm and a binary-tree based (BT) algorithm. This implementation shows how common communication patterns can be separated from the concrete parallel prefix sums algorithms and thus simplify the work of parallel programming. For each algorithm, the implementation uses two different synchronization options: barrier synchronization and point-to-point synchronization. These synchronization options lead to different communication patterns in the algorithms, which are represented by dependency graphs between tasks. The performance results show that point-to-point synchronization performs better than barrier synchronization as the number of processors increases. As part of the applications for parallel prefix sums, parallel radix sort and four parallel tree applications are built on top of the implementation. These applications are also fundamental parallel algorithms and they represent typical usage of parallel prefix sums in numeric computation and graph applications. The building of such applications become straighforward given this generic implementation of parallel prefix sums.
  • Parallel prefix sums algorithms are one of the simplest and most useful building
    blocks for constructing parallel algorithms. A generic implementation is valuable
    because of the wide range of applications for this method.
    This thesis presents a generic C++ implementation of parallel prefix sums. The
    implementation applies two separate parallel prefix sums algorithms: a recursive
    doubling (RD) algorithm and a binary-tree based (BT) algorithm.
    This implementation shows how common communication patterns can be separated
    from the concrete parallel prefix sums algorithms and thus simplify the work
    of parallel programming. For each algorithm, the implementation uses two different
    synchronization options: barrier synchronization and point-to-point synchronization.
    These synchronization options lead to different communication patterns in the algorithms,
    which are represented by dependency graphs between tasks.
    The performance results show that point-to-point synchronization performs better
    than barrier synchronization as the number of processors increases.
    As part of the applications for parallel prefix sums, parallel radix sort and four
    parallel tree applications are built on top of the implementation. These applications
    are also fundamental parallel algorithms and they represent typical usage of parallel
    prefix sums in numeric computation and graph applications. The building of such
    applications become straighforward given this generic implementation of parallel
    prefix sums.

publication date

  • May 2007