[go: up one dir, main page]

Skip to main content

The Shortest Superstring Problem

  • Chapter
  • First Online:
Optimization in Science and Engineering

Abstract

The shortest superstring problem (SSP) is a combinatorial optimization problem which has attracted the interest of many researchers due to its applications in computational molecular biology and in computer science. The SSP is an NP-hard problem, and therefore great effort to develop exact algorithms for it has not been made. On the other hand, several approximation and heuristic algorithms have been implemented indicating the strong effectiveness of the greedy strategies to this problem. Variations of these algorithms can be parallelized providing computational strength in solving real-world instances. Polynomially solvable versions of the problem obtained under specific restrictions to its parameters reveal the boundaries between hard and easy cases. The computational bounds on the approximability of the SSP are a realization of its Max-SNP-hardness, but the weak proved values of them reflect the potential strength of the greedy approximation techniques. The strength of the greedy methods for the SSP is enhanced also by the asymptotic behaviour and the smoothed analysis of the problem in random and real-world instances, respectively. All these issues are presented in this chapter in a concise way covering the whole relevant literature, revealing the knowledge that is already conquered, and paving the path for further development in the study of shortest superstrings. The order of the sections highlights the pass from hardness complexity results for the SSP to efficient algorithms for the problem based on greedy strategies, and to theoretical results that establish the strength of the greedy techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alexander, K.S.: Shortest common superstrings for strings of random letters. In: Crochemore, M., Gusfield, D. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 807, pp. 164–172. Springer, Berlin (1994)

    Chapter  Google Scholar 

  2. Armen, C., Stein, C.: Improved length bounds for the shortest superstring problem. In: Akl, S., Dehne, F., Sack, J.R., Santoro, N. (eds.) Algorithms and Data Structures. Lecture Notes in Computer Science, vol. 955, pp. 494–505. Springer, Berlin (1995)

    Chapter  Google Scholar 

  3. Armen, C., Stein, C.: Short superstrings and the structure of overlapping strings. J. Comput. Biol. 2(2), 307–332 (1995)

    Article  Google Scholar 

  4. Armen, C., Stein, C.: A \(2\frac{2} {3}\)-approximation algorithm for the shortest superstring problem. In: Hirschberg, D., Myers, G. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 1075, pp. 87–101. Springer, Berlin (1996)

    Chapter  Google Scholar 

  5. Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and the hardness of approximation problems. J. ACM 45(3), 501–555 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  6. Berger, B., Rompel, J., Shor, P.W.: Efficient NC algorithms for set cover with applications to learning and geometry. J. Comput. Syst. Sci. 49(3), 454–477 (1994)

    Article  MathSciNet  Google Scholar 

  7. Bläser, M.: An 8/13-approximation algorithm for the asymmetric maximum TSP. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’02), pp. 64–73. Society for Industrial and Applied Mathematics, Philadelphia (2002)

    Google Scholar 

  8. Blum, A., Jiang, T., Li, M., Tromp, J., Yannakakis, M.: Linear approximation of shortest superstrings. J. ACM 41, 630–647 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  9. Breslauer, D., Jiang, T., Jiang, Z.: Rotations of periodic strings and short superstrings. J. Algorithm. 24, 340–353 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  10. Chirico, N., Vianelli, A., Belshaw, R.: Why genes overlap in viruses. Proc. R. Soc. B. Biol. Sci. 277(1701), 3809–3817 (2010)

    Article  Google Scholar 

  11. Czumaj, A., Ga̧sieniec, L., Piotrów, M., Rytter, W.: Sequential and parallel approximation of shortest superstrings. J. Algorithm. 23, 74–100 (1997)

    Google Scholar 

  12. Festa, P., Resende, M.: GRASP: An annotated bibliography. In: Ribeiro, C., Hansen, P. (eds.) Essays and Surveys in Metaheuristics. Operations Research/Computer Science, pp. 325–367. Kluwer Academic, Dordecht (2002)

    Chapter  Google Scholar 

  13. Frieze, A., Szpankowski, W.: Greedy algorithms for the shortest common superstring that are asymptotically optimal. Algorithmica 21, 21–36 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  14. Gallant, J.K.: String compression algorithms. Ph.D. thesis, Princeton (1982)

    Google Scholar 

  15. Gallant, J., Maier, D., Storer, J.A.: On finding minimal length superstrings. J. Comput. Syst. Sci. 20(1), 50–58 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  16. Gerver, M.: Three-valued numbers and digraphs. Kvant 1987(2), 32–35 (1987)

    Google Scholar 

  17. Gevezes, T., Pitsoulis, L.: A greedy randomized adaptive search procedure with path relinking for the shortest superstring problem. J. Comb. Optim. (2013) doi: 10.1007/s10878-013-9622-z

    MATH  Google Scholar 

  18. Gingeras, T., Milazzo, J., Sciaky, D., Roberts, R.: Computer programs for the assembly of DNA sequences. Nucleic Acids Res. 7(2), 529–543 (1979)

    Article  Google Scholar 

  19. Glover, F., Laguna, M.: Tabu Search. Kluwer Academic, Norwell (1997)

    Book  MATH  Google Scholar 

  20. Goldberg, M.K., Lim, D.T.: A learning algorithm for the shortest superstring problem. In: Proceedings of the Atlantic Symposium on Computational Biology and Genome Information and Technology, pp. 171–175 (2001)

    Google Scholar 

  21. Goldberg, D., Deb, K., Korb, B.: Messy genetic algorithms: Motivation, analysis, and first results. Complex Syst. 3, 493–530 (1989)

    MATH  MathSciNet  Google Scholar 

  22. Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor (1975)

    Google Scholar 

  23. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40(9), 1098–1101 (1952)

    Google Scholar 

  24. Ilie, L., Popescu, C.: The shortest common superstring problem and viral genome compression. Fundam. Inform. 73, 153–164 (2006)

    MATH  MathSciNet  Google Scholar 

  25. Ilie, L., Tinta, L., Popescu, C., Hill, K.A.: Viral genome compression. In: Mao, C., Yokomori, T. (eds.) DNA Computing. Lecture Notes in Computer Science, vol. 4287, pp. 111–126. Springer, Berlin (2006)

    Google Scholar 

  26. Jenkyns, T.A.: The greedy travelling salesman’s problem. Networks 9(4), 363–373 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  27. Jiang, T., Li, M.: Approximating shortest superstrings with constraints. Theor. Comput. Sci. 134(2), 473–491 (1994)

    Article  MATH  Google Scholar 

  28. Kaplan, H., Shafrir, N.: The greedy algorithm for shortest superstrings. Inf. Process. Lett. 93, 13–17 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  29. Kaplan, H., Lewenstein, M., Shafrir, N., Sviridenko, M.: Approximation algorithms for asymmetric TSP by decomposing directed regular multigraphs. J. ACM 52, 602–626 (2005)

    Article  MathSciNet  Google Scholar 

  30. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972)

    Chapter  Google Scholar 

  31. Kosaraju, S.R., Park, J.K., Stein, C.: Long tours and short superstrings. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pp. 166–177. IEEE Computer Society, Washington, DC (1994)

    Google Scholar 

  32. Laguna, M., Martí, R.: GRASP and path relinking for 2-layer straight line crossing minimization. INFORMS J. Comput. 11, 44–52 (1999)

    Article  MATH  Google Scholar 

  33. Lesk, A.M.: Computational Molecular Biology. Sources and Methods for Sequence Analysis. Oxford University Press, Oxford (1988)

    Google Scholar 

  34. Li, M.: Towards a DNA Sequencing Theory (Learning a String), vol. 1, pp. 125–134. IEEE Computer Society, Los Alamitos (1990)

    Google Scholar 

  35. López-Rodríguez, D., Mérida-Casermeiro, E.: Shortest common superstring problem with discrete neural networks. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) Adaptive and Natural Computing Algorithms. Lecture Notes in Computer Science, vol. 5495, pp. 62–71. Springer, Berlin (2009)

    Chapter  Google Scholar 

  36. Ma, B.: Why greed works for shortest common superstring problem. In: Ferragina, P., Landau, G. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 5029, pp. 244–254. Springer, Berlin (2008)

    Chapter  Google Scholar 

  37. Maier, D., Storer, J.A.: A note on the complexity of the superstring problem. Technical Report 233, Computer Science Laboratory, Princeton University, Princeton (1977)

    Google Scholar 

  38. Middendorf, M.: More on the complexity of common superstring and supersequence problems. Theor. Comput. Sci. 125(2), 205–228 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  39. Middendorf, M.: Shortest common superstrings and scheduling with coordinated starting times. Theor. Comput. Sci. 191(1–2), 205–214 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  40. Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulation of traveling salesman problems. J. ACM 7, 326–329 (1960)

    Article  MATH  MathSciNet  Google Scholar 

  41. Ott, S.: Lower bounds for approximating shortest superstrings over an alphabet of size 2. In: Widmayer, P., Neyer, G., Eidenbenz, S. (eds.) Graph-Theoretic Concepts in Computer Science. Lecture Notes in Computer Science, vol. 1665, pp. 55–64. Springer, Berlin (1999)

    Chapter  Google Scholar 

  42. Papadimitriou, C.H., Steiglitz, K.: Combinatorial optimization: algorithms and complexity. Prentice-Hall, Englewood Cliffs (1982)

    MATH  Google Scholar 

  43. Papadimitriou, C.H., Yannakakis, M.: Optimization, approximation, and complexity classes. J. Comput. Syst. Sci. 43(3), 425–440 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  44. Papadimitriou, C.H., Yannakakis, M.: The traveling salesman problem with distances one and two. Math. Oper. Res. 18(1), 1–11 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  45. Peltola, H., Söderlund, H., Ukkonen, E.: SEQAID: a DNA sequence assembling program based on a mathematical model. Nucleic Acids Res. 12(1), 307–321 (1984)

    Article  Google Scholar 

  46. Pevzner, P.A., Waterman, M.S.: Open Combinatorial Problems in Computational Molecular Biology, p. 158. IEEE Computer Society, Los Alamitos (1995)

    Google Scholar 

  47. Pitsoulis, L., Resende, M.: Greedy randomized adaptive search procedures. In: Pardalos, P., Resende, M. (eds.) Handbook of Applied Optimization, pp. 178–183. Oxford University Press, Oxford (2002)

    Google Scholar 

  48. Plociennik, K.: A probabilistic PTAS for shortest common superstring. In: Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science 2009 (MFCS ’09), pp. 624–635. Springer, Berlin (2009)

    Google Scholar 

  49. Reif, J.H.: Synthesis of Parallel Algorithms, 1st edn. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  50. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)

    Article  MathSciNet  Google Scholar 

  51. Shapiro, M.B.: An algorithm for reconstructing protein and RNA sequences. J. ACM 14, 720–731 (1967)

    Article  MATH  Google Scholar 

  52. Spielman, D., Teng, S.H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. In: Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing (STOC ’01), pp. 296–305. ACM, New York (2001)

    Google Scholar 

  53. Staden, R.: Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. Nucleic Acids Res. 10(15), 4731–4751 (1982)

    Article  Google Scholar 

  54. Storer, J.A.: Data compression: Methods and theory. Computer Science Press, New York (1988)

    Google Scholar 

  55. Storer, J.A., Szymanski, T.G.: The macro model for data compression (extended abstract). In: Proceedings of the Tenth Annual ACM Symposium on Theory of Computing (STOC ’78), pp. 30–39. ACM, New York (1978)

    Google Scholar 

  56. Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29, 928–951 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  57. Sweedyk, Z.: A \(2\frac{1} {2}\)-approximation algorithm for shortest superstring. SIAM J. Comput. 29, 954–986 (1999)

    Article  MathSciNet  Google Scholar 

  58. Tarhio, J., Ukkonen, E.: A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57(1), 131–145 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  59. Tarjan, R.E.: Data Structures and Network Algorithms. Society for Industrial and Applied Mathematics, Philadelphia (1983)

    Book  Google Scholar 

  60. Teng, S.H., Yao, F.: Approximating Shortest Superstrings, pp. 158–165. IEEE Computer Society, Los Alamitos (1993)

    Google Scholar 

  61. Timkovskii, V.G.: Complexity of common subsequence and supersequence problems and related problems. Cybern. Syst. Anal. 25, 565–580 (1989)

    Article  MathSciNet  Google Scholar 

  62. Turner, J.S.: Approximation algorithms for the shortest common superstring problem. Inf. Comput. 83, 1–20 (1989)

    Article  MATH  Google Scholar 

  63. Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)

    Article  MATH  Google Scholar 

  64. Vassilevska, V.: Explicit inapproximability bounds for the shortest superstring problem. In: Jedrzejowicz, J., Szepietowski, A. (eds.) Mathematical Foundations of Computer Science 2005. Lecture Notes in Computer Science, vol. 3618, pp. 793–800. Springer, Berlin (2005)

    Chapter  Google Scholar 

  65. Yang, E., Zhang, Z.: The shortest common superstring problem: Average case analysis for both exact and approximate matching. IEEE Trans. Inf. Theory 45(6), 1867–1886 (1999)

    Article  MATH  Google Scholar 

  66. Zaritsky, A., Sipper, M.: Coevolving solutions to the shortest common superstring problem. Biosystems 76(1–3), 209–216 (2004)

    Article  Google Scholar 

  67. Zaritsky, A., Sipper, M.: The preservation of favored building blocks in the struggle for fitness: The puzzle algorithm. Evol. Comput. 8(5), 443–455 (2004)

    Article  Google Scholar 

Download references

Acknowledgements

This research has been funded by the European Union (European Social Fund—ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)—Research Funding Program: Thalis. Investing in knowledge society through the European Social Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theodoros P. Gevezes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Gevezes, T.P., Pitsoulis, L.S. (2014). The Shortest Superstring Problem. In: Rassias, T., Floudas, C., Butenko, S. (eds) Optimization in Science and Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0808-0_10

Download citation

Publish with us

Policies and ethics