Computer Science Colloquia
Aakrosh Ratan, Assistant Professor, Center for Public Health Genomics
Monday, November 3, 2014
3:30 PM, Rice Hall, Rm. 130 (Light refreshments after the seminar Rice Hall 4th floor atrium)
HOST: Kevin Skadron
Split-alignment of Short Sequences and Subsequent Identification of Variants
Structural changes in chromosomes represent a major source of variation that has been implicated both in phenotypic diversity as well as disease. Identification of breakpoints for such structural events remains a major challenge, especially considering short sequences that are produced using current DNA sequencing instruments. I will describe algorithms to align a query sequence of length n to a target genome, allowing parts of the query to match different parts of the target. I will use the standard affine-scoring scheme, and describe an inexact variant of the algorithm, which can utilize m sub-optimal alignments generated by another aligner, to find a maximal scoring split-alignment in O(mn) time. I will show that these alignments can be used to identify breakpoints with high accuracy. I will also discuss how these breakpoints can then be used in a framework to define the structural variants in these genomic datasets.
Bio: Aakrosh Ratan is currently an Assistant Professor in the Center for Public Health Genomics, with a primary appointment in the Department of Public Health at UVA. He is also a member of the Cancer Center at UVA. His research is focused on study of genome variation and genetic diversity, and its consequences on species health and survival. He is interested in developing algorithms and methods to identify and map variation (single-nucleotide polymorphisms, indels, and large-scale rearrangements) from large-scale sequencing datasets, and their application to diverse datasets ranging from cancer genomes, clinical cohorts, to data from endangered species and ancient DNA. His interests span from identification of such variants in species with a haploid representation (reference genome), to species with multiple representations, as well as species where we lack a reference genome. More information, including a list of publications can be viewed HERE