Computer Science Colloquia
Tuesday, October 8, 2013
Advisor & Chair: Gabriel Robins
Attending Faculty: Kevin Skadron, Jim Cohoon, Ira Hall, Aaron Quinlan
2:00 PM, Rice Hall, Rm. 242
Ph.D. Proposal Presentation
Efficient Genomic Interval Intersection Algorithms
The comparison of genomic datasets is fundamental to understanding genome biology. Researchers must explore many large datasets of genome intervals (e.g., genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic features are typically measured by identifying intervals that intersect: that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring relationships between and within genomic features is crucial for future discovery. This thesis proposes three novel algorithms that can power the next generation of genomic analysis. The first algorithm is an optimal solution to the problem of counting the number of intersections between two interval sets. This algorithm is intrinsically suited to parallel computing architectures such as Graphics Processing Units, and makes statistical comparison between large numbers of features possible. The second algorithm solves the problem of enumerating the intersections among a set of interval sets. This algorithm can dramatically reduce the number of comparisons between sets, and exposes significantly more parallelism than other solutions. Finally, I introduce a framework to detect changes in an organism’s genome that is powered by interval intersection. The algorithm is based on an abstract representation of these changes that allows the integration of any number of detection signals, including those generated from high throughput sequencing data or prior evidence. Considering many different signals simultaneously dramatically increases detection sensitivity. These contributions represent a significant advancement to computational biology.