Abstract
It is widely accepted that the optimal alignment between a pair of proteins or nucleic acid sequences that minimizes the edit distance may not necessarily reflect the correct biological alignment. Alignments of proteins based on their structures or of DNA sequences based on evolutionary changes are often different from alignments that minimize edit distance. However, in many cases (e.g. when the sequences are close), the edit distance alignment is a good approximation to the biological one. Since, for most sequences, the true alignment is unknown, a method that either assesses the significance of the optimal alignment, or that provides few “close” alternatives to the optimal one, is of great importance.
A suboptimal alignment is an alignment whose score lies within the neighborhood of the optimal score. Enumeration of suboptimal alignments [Wa83, WaBy] is not very practical since there are many such alignments. Other approaches [Zuk, Vi, ViArTANO] that use only partial information about suboptimal alignments are more successful in practice.
We present a method for representing all alignments whose score is within any given delta from the optimal score. It represents a large number of alignments by a compact graph which makes it easy to impose additional biological constraints and select one desirable alignment from this large set. We study the combinatorial nature of suboptimal alignments. We define a set of “canonical” suboptimal alignments, and argue that these are the essential ones since any other suboptimal alignment is a combination of few canonical ones. We then show how to efficiently enumerate suboptimal alignments in order of their score, and count their numbers. Examples are presented to motivate the problem.
Since alignments are essentially (s, t)-paths in a directed acyclic graph with (possibly negative) weights on its edges, our solution gives an extremely simple method to enumerate all K shortest (or longest) paths from s to t in such graphs in increasing order, as well as all (s, t) paths that are within δ of the optimum, for any δ. We compare this solution with known algorithms that find the K-best shortest paths in a graph.
Supported by a Postdoctoral Fellowship from the Program in Mathematics and Molecular Biology of the University of California at Berkeley, under National Science Foundation Grant DMS-9720208
Preview
Unable to display preview. Download preview PDF.
References
S.F Altschul and B. W. Erickson, Optimal Sequence Alignment Using Affine Gap Costs, Bull. Math. Biol. 48: 603–616, 1986.
R. Bellman and R. Kalaba, On Kth Best Policies, J. SIAM, 8(4):582–588, 1960.
S. Clarke, A. Krikorian and J. Rausen, Computing the N best Loopless Paths in a Network, J. Siam, 11 (4): 1096–1102, 1963.
R.F. Doolittle, Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences, University Science Books, 1986.
R.F. Doolittle, edt. Methods in Enzymology, Volume 183: Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, 1990.
S.E. Dreyfus, An Appraisal of Some Shortest-Path Algorithms, Operations Research, 17 (1969), 395–412.
J. Edmonds and R.M. Karp, Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems, JACM, 19(2):248–264, 1972.
W.M. Fitch and T.F. Smith, Optimal Sequence Alignments, Proc. Natl. Acad. Sci. USA 80(1983) 1382–1386.
B. Fox, Calculating the Kth Shortest Paths, INFOR, Canad. J. Oper. Infor. Process., 11:66–70, 1973.
O. Gotoh, An Improved Method for Matching Biological Sequences, J. Mol. Biol., 162:705–708, 1982.
D. Gusfield, K. Balasubramanian and D. Naor, Parametric Optimization of Sequence Alignment, Proceedings of the third annual ACM-SIAM Joint Symposium on Discrete Algorithms, Orlando, Florida, Jan. 1992.
W. Hoffman and R. Pavley, A Method for the Solution of the Nth Best Path Problem, J. ACM 6, 506–514 (1959).
N. Katoh, T. Ibaraki and H. Mine, An Efficient Algorithm for K Shortest Simple Paths, Networks, 12:411–427, 1982.
E. L. Lawler, A Procedure for Computing the K-best Solutions to Discrete Optimization Problems and its Applications to the Shortest Paths Problem, Manag. Sci. 18:401–405, 1972.
E. L. Lawler, Combinatorial Optimization, Networks and Matroids, Holt, Rinehart and Winston, 1976.
A. Perko, Implementation of Algorithms for K Shortest Loopless Paths, Networks 16:149–160, 1986.
M. Pollack, The kth Best Route Through A Network, OR, 9 (4):578–580, 1961.
D. Sankoff and J. Kruskal, Editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983.
M. Schoniger and M.S. Waterman, A Local Algorithm for DNA Sequence Alignment with Inversions, Bull. Math. Bio. 54(4):521–536, 1992.
P. H. Sellers, An Algorithm for the Distance Between Two Finite Sequences, J. Comb. Theory. (A), 16:253–258, 1974.
D.R. Shier, On Algorithms for Finding the K Shortest Paths in a Network, Networks, 9:195–214, 1979.
M. Vingron, Multiple Sequence Alignment and Applications in Molecular Biology, Preprint 91–12, Universitat Heidelberg, 1991.
M. Vingron and P. Argos, Determination of Reliable Regions in Protein Sequence Alignments, Protein Engin., 7: 565–569, 1990.
M.S. Waterman, Sequence Alignments in the Neighborhood of the Optimum with General Application to Dynamic Programming, Proc. Nat, Acad, Sci. USA. 80:3123–3124, 1983.
M.S. Waterman, Parametric and Ensemble Sequence Alignment Algorithms, manuscript.
M.S. Waterman and T.H. Byers, A Dynamic Programming Algorithm to Find All Solutions in a Neighborhood of the Optimum, Math, Biosci. 77:179–188, 1985.
M.S. Waterman and M. Eggert, A New Algorithm for Best Subsequence Alignments With Applications to tRNA-rRNA Comparisons, J. Mol. Biol. 197:723–728, 1987.
M.S. Waterman, M. Eggert and E. Lander, Parametric Sequence Comparisons, PNAS 89:6090–6093, July 1992.
J.Y. Yen, Finding the K Shortest Loopless Paths in a Network, Management Science, 17 (11):712–716, 1971.
M. Zuker, Suboptimal Sequence Alignment in Molecular Biology, Alignment with Error Analysis, J. Mol. Biol., 221:403–420, 1991.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Naor, D., Brutlag, D. (1993). On suboptimal alignments of biological sequences. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1993. Lecture Notes in Computer Science, vol 684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029805
Download citation
DOI: https://doi.org/10.1007/BFb0029805
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56764-6
Online ISBN: 978-3-540-47732-7
eBook Packages: Springer Book Archive