LCPC 2006 START Conference Manager    

Optimizing the use of static buffers for DMA on a CELL chip

Tong Chen, Zehra Sura, Kathryn O'Brien and Kevin O'Brien

The 19th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2006)
New Orleans, Louisiana, November 2-4, 2006


Abstract

The CELL architecture has one PowerPC-based PPE (Power Processor Element) core, and eight SPE (Synergistic Processor Element) cores that have a distinct instruction set architecture of their own. The PPE core accesses memory via a traditional caching mechanism, but each SPE core can only access memory via a small 256K software-controlled local store. The PPE cache and SPE local stores are connected to each other and main memory via a high bandwidth bus. The CELL chip provides high compute power, but automatically generating code that takes advantage of this performance potential is a complex task. Software is responsible for all data transfers to and from the SPE local stores. To hide the high latency of DMA transfers, data may be prefetched into SPE local stores using loop tiling transformations and static buffers. We find that the performance of an application can vary depending on the size of the buffers used, and whether a single-, double-, or triple-buffering scheme is used. Constrained by the limited space available for data buffers in the SPE local store, we want to choose the optimal buffering scheme for a given space budget. Also, we want to be able to determine the optimal buffer size for a given scheme, such that using a larger buffer size results in negligible performance improvement. We develop a model to automatically infer these parameters for static buffering, taking into account the DMA latency and transfer rates, and the amount of computation in the application loop being targeted. We test the accuracy of our prediction model within a research prototype compiler developed on top of the IBM XL compiler infrastructure. This compiler automatically generates code to exploit the heterogenous on-chip parallelism in a CELL architecture for programs written using OpenMP parallel directives.


  
START Conference Manager (V2.53.1)
Maintainer: rrgerber@softconf.com