PPoPP 2008 START Conference Manager    

Semantics-based Distributed I/O for mpiBLAST (poster presentation)

Pavan Balaji, Wu-chun Feng, Jeremy Archuleta, Heshan Lin, Rajkumar Kettimuthu and Xiaosong Ma

The 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008)
Salt Lake City, Utah, February 20-23, 2008


BLAST is a widely used software toolkit for sequence searching in genomes. mpiBLAST is a freely-available, open-source parallelization of BLAST that uses database segmentation to allow different worker processors to search (in parallel) unique segments of the database. After synchronization, the workers write their output to a filesystem. While mpiBLAST has been shown to achieve high-performance in clusters with fast local filesystems, its I/O processing remains a concern for scalability, especially in systems utilizing distributed filesystems.

Thus, we present ``ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing'' -- an environment that decouples computation and I/O in applications such as mpiBLAST and drastically reduces I/O overhead through metadata processing. Specifically, for mpiBLAST, ParaMEDIC partitions worker processes into compute and I/O workers. Compute workers, instead of directly writing output to the distributed filesystem, convert their output to metadata, and send it to I/O workers. I/O workers then process this metadata to re-create the actual output and write it to the filesystem. This allows ParaMEDIC to cut down on the I/O time, thus accelerating mpiBLAST by as much as 25-fold in some cases.

START Conference Manager (V2.54.5)