[go: up one dir, main page]

Skip to main content

On suboptimal alignments of biological sequences

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 684))

Included in the following conference series:

Abstract

It is widely accepted that the optimal alignment between a pair of proteins or nucleic acid sequences that minimizes the edit distance may not necessarily reflect the correct biological alignment. Alignments of proteins based on their structures or of DNA sequences based on evolutionary changes are often different from alignments that minimize edit distance. However, in many cases (e.g. when the sequences are close), the edit distance alignment is a good approximation to the biological one. Since, for most sequences, the true alignment is unknown, a method that either assesses the significance of the optimal alignment, or that provides few “close” alternatives to the optimal one, is of great importance.

A suboptimal alignment is an alignment whose score lies within the neighborhood of the optimal score. Enumeration of suboptimal alignments [Wa83, WaBy] is not very practical since there are many such alignments. Other approaches [Zuk, Vi, ViArTANO] that use only partial information about suboptimal alignments are more successful in practice.

We present a method for representing all alignments whose score is within any given delta from the optimal score. It represents a large number of alignments by a compact graph which makes it easy to impose additional biological constraints and select one desirable alignment from this large set. We study the combinatorial nature of suboptimal alignments. We define a set of “canonical” suboptimal alignments, and argue that these are the essential ones since any other suboptimal alignment is a combination of few canonical ones. We then show how to efficiently enumerate suboptimal alignments in order of their score, and count their numbers. Examples are presented to motivate the problem.

Since alignments are essentially (s, t)-paths in a directed acyclic graph with (possibly negative) weights on its edges, our solution gives an extremely simple method to enumerate all K shortest (or longest) paths from s to t in such graphs in increasing order, as well as all (s, t) paths that are within δ of the optimum, for any δ. We compare this solution with known algorithms that find the K-best shortest paths in a graph.

Supported by a Postdoctoral Fellowship from the Program in Mathematics and Molecular Biology of the University of California at Berkeley, under National Science Foundation Grant DMS-9720208

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S.F Altschul and B. W. Erickson, Optimal Sequence Alignment Using Affine Gap Costs, Bull. Math. Biol. 48: 603–616, 1986.

    Google Scholar 

  2. R. Bellman and R. Kalaba, On Kth Best Policies, J. SIAM, 8(4):582–588, 1960.

    Google Scholar 

  3. S. Clarke, A. Krikorian and J. Rausen, Computing the N best Loopless Paths in a Network, J. Siam, 11 (4): 1096–1102, 1963.

    Google Scholar 

  4. R.F. Doolittle, Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences, University Science Books, 1986.

    Google Scholar 

  5. R.F. Doolittle, edt. Methods in Enzymology, Volume 183: Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, 1990.

    Google Scholar 

  6. S.E. Dreyfus, An Appraisal of Some Shortest-Path Algorithms, Operations Research, 17 (1969), 395–412.

    Google Scholar 

  7. J. Edmonds and R.M. Karp, Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems, JACM, 19(2):248–264, 1972.

    Google Scholar 

  8. W.M. Fitch and T.F. Smith, Optimal Sequence Alignments, Proc. Natl. Acad. Sci. USA 80(1983) 1382–1386.

    Google Scholar 

  9. B. Fox, Calculating the Kth Shortest Paths, INFOR, Canad. J. Oper. Infor. Process., 11:66–70, 1973.

    Google Scholar 

  10. O. Gotoh, An Improved Method for Matching Biological Sequences, J. Mol. Biol., 162:705–708, 1982.

    Google Scholar 

  11. D. Gusfield, K. Balasubramanian and D. Naor, Parametric Optimization of Sequence Alignment, Proceedings of the third annual ACM-SIAM Joint Symposium on Discrete Algorithms, Orlando, Florida, Jan. 1992.

    Google Scholar 

  12. W. Hoffman and R. Pavley, A Method for the Solution of the Nth Best Path Problem, J. ACM 6, 506–514 (1959).

    Google Scholar 

  13. N. Katoh, T. Ibaraki and H. Mine, An Efficient Algorithm for K Shortest Simple Paths, Networks, 12:411–427, 1982.

    Google Scholar 

  14. E. L. Lawler, A Procedure for Computing the K-best Solutions to Discrete Optimization Problems and its Applications to the Shortest Paths Problem, Manag. Sci. 18:401–405, 1972.

    Google Scholar 

  15. E. L. Lawler, Combinatorial Optimization, Networks and Matroids, Holt, Rinehart and Winston, 1976.

    Google Scholar 

  16. A. Perko, Implementation of Algorithms for K Shortest Loopless Paths, Networks 16:149–160, 1986.

    Google Scholar 

  17. M. Pollack, The kth Best Route Through A Network, OR, 9 (4):578–580, 1961.

    Google Scholar 

  18. D. Sankoff and J. Kruskal, Editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983.

    Google Scholar 

  19. M. Schoniger and M.S. Waterman, A Local Algorithm for DNA Sequence Alignment with Inversions, Bull. Math. Bio. 54(4):521–536, 1992.

    Google Scholar 

  20. P. H. Sellers, An Algorithm for the Distance Between Two Finite Sequences, J. Comb. Theory. (A), 16:253–258, 1974.

    Google Scholar 

  21. D.R. Shier, On Algorithms for Finding the K Shortest Paths in a Network, Networks, 9:195–214, 1979.

    Google Scholar 

  22. M. Vingron, Multiple Sequence Alignment and Applications in Molecular Biology, Preprint 91–12, Universitat Heidelberg, 1991.

    Google Scholar 

  23. M. Vingron and P. Argos, Determination of Reliable Regions in Protein Sequence Alignments, Protein Engin., 7: 565–569, 1990.

    Google Scholar 

  24. M.S. Waterman, Sequence Alignments in the Neighborhood of the Optimum with General Application to Dynamic Programming, Proc. Nat, Acad, Sci. USA. 80:3123–3124, 1983.

    Google Scholar 

  25. M.S. Waterman, Parametric and Ensemble Sequence Alignment Algorithms, manuscript.

    Google Scholar 

  26. M.S. Waterman and T.H. Byers, A Dynamic Programming Algorithm to Find All Solutions in a Neighborhood of the Optimum, Math, Biosci. 77:179–188, 1985.

    Google Scholar 

  27. M.S. Waterman and M. Eggert, A New Algorithm for Best Subsequence Alignments With Applications to tRNA-rRNA Comparisons, J. Mol. Biol. 197:723–728, 1987.

    Google Scholar 

  28. M.S. Waterman, M. Eggert and E. Lander, Parametric Sequence Comparisons, PNAS 89:6090–6093, July 1992.

    Google Scholar 

  29. J.Y. Yen, Finding the K Shortest Loopless Paths in a Network, Management Science, 17 (11):712–716, 1971.

    Google Scholar 

  30. M. Zuker, Suboptimal Sequence Alignment in Molecular Biology, Alignment with Error Analysis, J. Mol. Biol., 221:403–420, 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Maxime Crochemore Zvi Galil Udi Manber

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Naor, D., Brutlag, D. (1993). On suboptimal alignments of biological sequences. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1993. Lecture Notes in Computer Science, vol 684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029805

Download citation

  • DOI: https://doi.org/10.1007/BFb0029805

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56764-6

  • Online ISBN: 978-3-540-47732-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics