AN ITERATION PARTITION APPROACH FOR CACHE OR LOCAL MEMORY THRASHING ON PARALLEL-PROCESSING
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
Parallel processing systems with cache or local memory in the memory hierarchies have become very common. These systems have local cache memory in each processor and usually employ write-invalidate protocol for the cache coherence. In such systems, a problem called cache or local memory thrashing may arise in executions of parallel programs, when the data unnecessarily moves back and forth between the caches or local memories in different processors. The techniques associated with parallelized compilers to solve the problem have not been completely developed. In this paper we present an approach to eliminate, or at least to reduce, unnecessary data movement between the caches or local memories for nested parallel loops. This approach is based on relations between array element accesses and enclosed loop indexes in the nested parallel loops. The relations can be used to assign processors to execute the appropriate iterations for parallel loops in the loop nests with respect to the data in their caches or local memories. An algorithm to calculate the correct iteration of the parallel loop in terms of loop indexes of the previous iterations executed in the processor is presented in the paper, even though there is more than one subscript expression of the same array variable in the loop. This method benefits parallel code with nested loop structures in a wide range of applications, in which the array elements are repeatedly referenced in the parallel loops. The experimental results show that the technique is extremely effectivecapable of achieving speedups up to 2 over application programs such as Linpack benchmarks. 1993 IEEE