Computer Science Colloquia
Thursday, March 29, 2012
Advisor: Gabe Robins
Attending Faculty: Kevin Skadron, Chair; Paul Reynolds, and Marty Humphrey
8:15 AM, Rice Hall, Rm. 242
Ph.D. Quals Exam Presentation
Binary Interval Search (BITS): A Massively Parallel Interval Intersection Algorithm
The integration and comparison of diverse genomic datasets is fundamental to understanding the biology of the genome and the basis of human disease. Researchers must explore many large datasets of genome intervals (e.g., genes, sequence alignments) in order to place their experiments in a broader context and make new discoveries. Relationships between experimental datasets and the wealth of existing genome annotations must be measured, where a relationships is represented by a common genome interval. New insights from these analyses intrinsically depend upon efficient methods for identifying intersecting (i.e., they share at least 1 base pair in common) genome intervals. Continued advances in massively parallel DNA sequencing technologies have enabled diverse explorations of genome biology. Consequently, existing methods for detecting interval intersections are challenged by the magnitude and complexity of current datasets. Here we introduce the Binary Interval Search (BITS) algorithm, a highly-scalable approach to the interval set intersection problem. Our analyses illustrate that BITS is more efficient than existing approaches on a single CPU. Moreover, we demonstrate substantial speedups (over 75x) when the BITS algorithm is applied to massively parallel architectures such as NVIDIA's CUDA Graphics Processing Units (GPUs). The efficiency of this algorithm is particularly promising in the context of large-scale comparisons to public genomic annotation databases, and for Monte-Carlo measurements of significant associations between experimental datasets and genomic features. Moreover, we note that our approach is especially suited to the emerging ``hybrid'' computing cluster nodes equipped with GPU cards to boost computing throughput.