PPoPP 2008 START Conference Manager    

Compiler-assisted Application-level Checkpointing for MPI Programs (poster presentation)

Xuejun Yang, Panfeng Wang, Hongyi Fu, Yunfei Du, Zhiyuan Wang and Jia jia

The 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008)
Salt Lake City, Utah, February 20-23, 2008


Application-level checkpointing can decrease the overhead of fault tolerance through minimizing the amount of checkpoint data. But this technique requires the programmer to choose the critical data that should be saved. This burdens the programmer. In this paper, we firstly propose a live-variable analysis method for MPI programs basing on the concepts of intra-process and inter-process definition-use chains. Then, we provide an optimization method of data saving for application-level checkpointing based on the live-variable analysis for MPI programs. Taking these as the theoretical foundation, we implement a source-to-source precompiler (CAC) to automate application-level checkpointing. Finally, we evaluate the performance of five FORTRAN/MPI programs which are transformed and integrated checkpointing features by CAC on a 512-CPU cluster system. The experimental results show that the application-level checkpointing based on live-variable analysis for MPI programs can efficiently reduce the amount of checkpoint data, thereby decrease the overhead of checkpoint and restart. The experiment also shows that CAC is capable of automating application-level checkpointing correctly and effectively.

START Conference Manager (V2.54.5)