PPoPP 2008 START Conference Manager    

Compiler Optimization for Parallelizing General-Purpose Applications under Thread-Level Speculation (poster presentation)

Shengyue Wang, Antonia Zhai, Pen Chung Yew and Guojin He

The 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008)
Salt Lake City, Utah, February 20-23, 2008


As technology advances, microprocessors that integrate multiple cores on a single chip are becoming increasingly common. How to use these processors to improve the performance of a single program has been a challenge posed to all computer architects and compiler designers. For general purpose applications, it is especially difficult to create efficient parallel execution due to the complex control flow and ambiguous data dependences of these applications. Two hardware mechanisms, Thread-Level Speculation (TLS) and Transactional Memory (TM), enable speculative parallel execution regardless of potential inter-thread/inter-transaction dependences by providing hardware support for the detection of data dependence violations and for the recovery from failed speculation to guarantee program correctness. However, speculatively parallelized programs may or may not be beneficial due to the high cost associated with failed speculation. Therefore, it is extremely important for compilers transform the program to best exploit parallelism. Unfortunately, traditional parallel compilers are incapable of such tasks. Furthermore, the ubiquitous existence of complex and input-dependent control flow as well as data dependence patterns in general purpose applications makes it impossible for one technique to optimize all programs. In this paper, we build several compiler optimization techniques targetting parallelize general-purpose applications on top of a compiler infrastructure that conduct cost-benefit analysis of speculative parallel execution.

The proposed techniques utilize probabilistic and non-probabilistic information to instruction scheduling, reduction transformation and iteration merging. Instruction scheduling improves the efficiency of synchronizing frequently occurring memory dependences; reduction transformation reduces the impact of a class of reduction variables whose intermediate results are used by non-reduction operations; iteration merging improves the load balancing by dynamically combining loop iterations to achieve more balanced workloads. With the combined effords of these optimization, we are able to achieve a program speedup of 1.36 on 15 SPEC2000 benchmarks.

START Conference Manager (V2.54.5)