Massively Parallel Genomic Sequence Search on Blue Gene/P
Abstract:
This paper presents our first experiences in
mapping and optimizing genomic sequence search onto the
massively parallel IBM Blue Gene/P (BG/P) platform.
Specifically, we performed our work on mpiBLAST, a parallel
sequence-search code that has been optimized on numerous
supercomputing environments. In doing so, we identify several
critical performance issues. Consequently, we propose and
study different approaches for mapping sequence-search and
parallel I/O tasks on such massively parallel architectures.
We demonstrate that our optimizations can deliver nearly
linear scaling (93% efficiency) on up to 32,768 cores of BG/P.
In addition, we show that such scalability enables us to
complete a large-scale bioinformatics problem ¡ª sequence
searching a microbial genome database against itself to
support the discovery of missing genes in genomes ¡ª in only
a few hours on BG/P. Previously, this problem was viewed as
computationally intractable in practice.
No comments:
Post a Comment