We generalize some current approaches for RNA tree alignment, which are traditionally confined to... more We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem (HSA), and present a new algorithm which applies to several modes, combining global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that for input trees T and S, our algorithm has an O(n T n S + min(d T ,d S )L T L S ) time complexity, where n T ,L T and d T are the number of nodes, the number of leaves, and the maximum node degree in T, respectively (satisfying d T ≤ L T ≤ n T ), and similarly for n S ,L S and d S with respect to the tree S. This improves the time complexity of previous algorithms for less general ...
Sparsification is a technique to speed up dynamic programming algorithms which has been successfu... more Sparsification is a technique to speed up dynamic programming algorithms which has been successfully applied to RNA structure prediction, RNA-RNA-interaction prediction, simultaneous alignment and folding, and pseudoknot prediction. So far, sparsification has been more a collection of loosely related examples and no general, well understood theory. In this work we propose a general theory to describe and implement sparsification in dynamic programming algorithms. The approach is formalized as an extension of Algebraic Dynamic Programming (ADP) which makes it applicable to a variety of algorithms and scoring schemes. In particular, this is the first approach that shows how to sparsify algorithms with scoring schemes that go beyond simple minimization or maximization, like enumeration of suboptimal solutions and approximation of the partition function. As an example, we show how to sparsify different variants of RNA structure prediction algorithms. 1
Abstract. We study Valiant’s classical algorithm for Context Free Grammar recognition in sub-cubi... more Abstract. We study Valiant’s classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant’s approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant’s technique and can be applied to all problems which abide by these templates. These algorithms obtain new worst case running time bounds for a large family of important problems within the world of RNA Secondary Structures and Context Free Grammars. 1
ABSTRACT We generalize some current approaches for RNA tree alignment, which are traditionally co... more ABSTRACT We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem, and present a new algorithm which applies to several modes, including global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that our algorithm has an O(nTnS min (dT, dS)) time complexity, where nT and nS are the number of nodes and dT and dS are the maximum node degrees in the input trees T and S, respectively. This maintains (and slightly improves) the time complexity of previous, less general algorithms for the problem. Supplemental materials, source code, and web-interface for our tool are found in http://www.cs.bgu.ac.il/~negevcb/FRUUT.
Methods for detecting the genomic signatures of natural selection have been heavily studied, and ... more Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory-for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.
Proceedings of the National Academy of Sciences, 2013
Breakage-fusion-bridge (BFB) is a mechanism of genomic instability characterized by the joining a... more Breakage-fusion-bridge (BFB) is a mechanism of genomic instability characterized by the joining and subsequent tearing apart of sister chromatids. When this process is repeated during multiple rounds of cell division, it leads to patterns of copy number increases of chromosomal segments as well as fold-back inversions where duplicated segments are arranged head-to-head. These structural variations can then drive tumorigenesis. BFB can be observed in progress using cytogenetic techniques, but generally BFB must be inferred from data such as microarrays or sequencing collected after BFB has ceased. Making correct inferences from this data is not straightforward, particularly given the complexity of some cancer genomes and BFB's ability to generate a wide range of rearrangement patterns. Here we present algorithms to aid the interpretation of evidence for BFB. We first pose the BFB count-vector problem: given a chromosome segmentation and segment copy numbers, decide whether BFB can yield a chromosome with the given segment counts. We present a linear time algorithm for the problem, in contrast to a previous exponential time algorithm. We then combine this algorithm with fold-back inversions to develop tests for BFB. We show that, contingent on assumptions about cancer genome evolution, count vectors and fold-back inversions are sufficient evidence for detecting BFB. We apply the presented techniques to paired-end sequencing data from pancreatic tumors and confirm a previous finding of BFB as well as identify a chromosomal region likely rearranged by BFB cycles, demonstrating the practicality of our approach.
We generalize some current approaches for RNA tree alignment, which are traditionally confined to... more We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem (HSA), and present a new algorithm which applies to several modes, combining global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that for input trees T and S, our algorithm has an O(n T n S + min(d T ,d S )L T L S ) time complexity, where n T ,L T and d T are the number of nodes, the number of leaves, and the maximum node degree in T, respectively (satisfying d T ≤ L T ≤ n T ), and similarly for n S ,L S and d S with respect to the tree S. This improves the time complexity of previous algorithms for less general ...
Sparsification is a technique to speed up dynamic programming algorithms which has been successfu... more Sparsification is a technique to speed up dynamic programming algorithms which has been successfully applied to RNA structure prediction, RNA-RNA-interaction prediction, simultaneous alignment and folding, and pseudoknot prediction. So far, sparsification has been more a collection of loosely related examples and no general, well understood theory. In this work we propose a general theory to describe and implement sparsification in dynamic programming algorithms. The approach is formalized as an extension of Algebraic Dynamic Programming (ADP) which makes it applicable to a variety of algorithms and scoring schemes. In particular, this is the first approach that shows how to sparsify algorithms with scoring schemes that go beyond simple minimization or maximization, like enumeration of suboptimal solutions and approximation of the partition function. As an example, we show how to sparsify different variants of RNA structure prediction algorithms. 1
Abstract. We study Valiant’s classical algorithm for Context Free Grammar recognition in sub-cubi... more Abstract. We study Valiant’s classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant’s approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant’s technique and can be applied to all problems which abide by these templates. These algorithms obtain new worst case running time bounds for a large family of important problems within the world of RNA Secondary Structures and Context Free Grammars. 1
ABSTRACT We generalize some current approaches for RNA tree alignment, which are traditionally co... more ABSTRACT We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem, and present a new algorithm which applies to several modes, including global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that our algorithm has an O(nTnS min (dT, dS)) time complexity, where nT and nS are the number of nodes and dT and dS are the maximum node degrees in the input trees T and S, respectively. This maintains (and slightly improves) the time complexity of previous, less general algorithms for the problem. Supplemental materials, source code, and web-interface for our tool are found in http://www.cs.bgu.ac.il/~negevcb/FRUUT.
Methods for detecting the genomic signatures of natural selection have been heavily studied, and ... more Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory-for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.
Proceedings of the National Academy of Sciences, 2013
Breakage-fusion-bridge (BFB) is a mechanism of genomic instability characterized by the joining a... more Breakage-fusion-bridge (BFB) is a mechanism of genomic instability characterized by the joining and subsequent tearing apart of sister chromatids. When this process is repeated during multiple rounds of cell division, it leads to patterns of copy number increases of chromosomal segments as well as fold-back inversions where duplicated segments are arranged head-to-head. These structural variations can then drive tumorigenesis. BFB can be observed in progress using cytogenetic techniques, but generally BFB must be inferred from data such as microarrays or sequencing collected after BFB has ceased. Making correct inferences from this data is not straightforward, particularly given the complexity of some cancer genomes and BFB's ability to generate a wide range of rearrangement patterns. Here we present algorithms to aid the interpretation of evidence for BFB. We first pose the BFB count-vector problem: given a chromosome segmentation and segment copy numbers, decide whether BFB can yield a chromosome with the given segment counts. We present a linear time algorithm for the problem, in contrast to a previous exponential time algorithm. We then combine this algorithm with fold-back inversions to develop tests for BFB. We show that, contingent on assumptions about cancer genome evolution, count vectors and fold-back inversions are sufficient evidence for detecting BFB. We apply the presented techniques to paired-end sequencing data from pancreatic tumors and confirm a previous finding of BFB as well as identify a chromosomal region likely rearranged by BFB cycles, demonstrating the practicality of our approach.
Uploads