Computer Science Colloquia
Thursday, October 27, 2011
Advisor: Kevin Skadron
Attending Faculty: Marty Humphrey, Gabriel Robins (Chair), and Westley Weimer
Thornton C-311, 1:00 PM
Ph.D. Qualifying Exam Presentation
A DNA Alignment Tool for Structural Variation Detection
With the arrival of "next generation" DNA sequencing technology in 2005, the cost of DNA sequencing is now declining at a significantly faster pace than Moore's Law . There has been a corresponding increase in the rate of accumulation of DNA sequencing information for many species, especially human. This proliferation of human data provides new opportunities for discovery in many areas. One such area is Structural Variation (SV); the insertion, deletion, duplication, and/or inversion of stretches of DNA exceeding 50 bases in length. The study of SV is particularly important for cancer research, as cancerous cells exhibit high levels of SV compared to healthy cells. However, most existing DNA analysis tools are not tuned to investigating SV. In addition, within the next year or so, new sequencers will be commercially available capable of producing in volume contiguous reads of 1000 bases or more. Such reads can in principal be used to more easily and accurately determine the locations of SV. We have built a DNA alignment tool comprised of a combination of existing and new algorithmic approaches specifically targeted at the discovery and analysis of SV in mammalian genomes using contiguous reads of length varying from 100s to 10,000s of bases. In particular, this tool will include an approach called "Optimal Query Coverage" for finding a collection of Primary Alignments that cover the read length. In addition, the remaining alignments are filtered to match the Primary Alignments using a strategy we call "Filter By Similarity". To the best of our knowledge, these two approaches are not being used in any current DNA aligner. Both of these features are important for accurate SV breakpoint reconstruction. We will validate that this aligner can meet or exceed the performance of existing DNA aligners in terms of runtimes and/or accuracy.