PPoPP 2008 START Conference Manager    

Cache-Aware Iteration Space Partitioning for Efficient Load Balancing on Multi-Core Systems (poster presentation)

Arun Kejariwal, Alex Nicolau, Utpal Banerjee, Alex Veidenbaum and Constantine Polychronopoulos

The 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008)
Salt Lake City, Utah, February 20-23, 2008


The need for high performance per watt has led to development of multi-core systems such as the Intel Core 2 Duo processor and the Intel quad-core Kentsfield processor. Maximal exploitation of the hardware parallelism supported by such systems necessitates the development of concurrent software. This, in part, entails automatic parallelization of programs and efficient mapping of the parallelized program onto the different cores. The latter affects the load balance between the different cores which in turn has a direct impact on performance. In light of the fact that, parallel loops, such as a parallel DO loop in Fortran, account for a large percentage of the total execution time, we focus on the problem of how to efficiently partition the iteration space of (possibly) nested perfect/non-perfect parallel loops. In this paper, we present a novel technique for cache-aware partitioning of iteration spaces of such loops. Specifically, we propose a technique for iteration space partitioning which captures the effect of variation in the number of cache misses across the iteration space. Subsequently, we propose a general approach to capture the variation of both the number of cache misses and computation across the iteration space. We motivate the problem and demonstrate the efficacy of our approach on a dedicated 4-way Intel Xeon based multiprocessor using several kernels from the industry-standard SPEC CPU2000 and CPU2006 benchmarks demonstrating speedups upto 62.5%.

START Conference Manager (V2.54.5)