Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors Conference Paper uri icon

abstract

  • Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many important applications. Unfortunately, known techniques for run-time parallelization are often computationally expensive or not general enough. To address this problem, we propose new hardware support for efficient run-time parallelization in distributed shared-memory (DSM) multiprocessors. The idea is to execute the code in parallel speculatively and use extensions to the cache coherence protocol hardware to detect any dependence violations. As soon as a dependence is detected, execution stops, the state is restored, and the code is re-executed serially. This scheme, which we apply to loops, allows iterations to execute and complete in potentially any order. This scheme requires hardware extensions to the cache coherence protocol and memory hierarchy of a DSM. It has low overhead. In this paper, we present the algorithms and a hardware design of the scheme. Overall, the scheme delivers average loop speedups of 7.3 for 16 processors and is 50% faster than a related software-only method.

name of conference

  • Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture

published proceedings

  • 1998 FOURTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS

author list (cited authors)

  • Zhang, Y., Rauchwerger, L., & Torrellas, J.

citation count

  • 28

complete list of authors

  • Zhang, Y||Rauchwerger, L||Torrellas, J

publication date

  • January 1998