Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previous work, we have speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor depen- dences. If the test failed, then the loop was re-executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross- processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. We demonstrate a generalization of the speculative doall parallelization tech- nique, called the Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the over- head of the run-time dependence test itself. In this thesis, we have presented the base algorithm and an analysis of the different heuristics for its practical applica- tion. To reduce the run-time overhead of the Recursive LRPD test, we have im- plemented on-demand checkpointing and commit, more efficient data dependence analysis and shadow structures, and feedback-guided load balancing. We obtained scalable speedups for loops from Track, Spice, and FMA3D that were not paralleliz- able by previous speculative parallelization methods.
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previous work, we have speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor depen- dences. If the test failed, then the loop was re-executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross- processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. We demonstrate a generalization of the speculative doall parallelization tech- nique, called the Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the over- head of the run-time dependence test itself. In this thesis, we have presented the base algorithm and an analysis of the different heuristics for its practical applica- tion. To reduce the run-time overhead of the Recursive LRPD test, we have im- plemented on-demand checkpointing and commit, more efficient data dependence analysis and shadow structures, and feedback-guided load balancing. We obtained scalable speedups for loops from Track, Spice, and FMA3D that were not paralleliz- able by previous speculative parallelization methods.