Abstract
The shortest superstring problem (SSP) is a combinatorial optimization problem which has attracted the interest of many researchers due to its applications in computational molecular biology and in computer science. The SSP is an NP-hard problem, and therefore great effort to develop exact algorithms for it has not been made. On the other hand, several approximation and heuristic algorithms have been implemented indicating the strong effectiveness of the greedy strategies to this problem. Variations of these algorithms can be parallelized providing computational strength in solving real-world instances. Polynomially solvable versions of the problem obtained under specific restrictions to its parameters reveal the boundaries between hard and easy cases. The computational bounds on the approximability of the SSP are a realization of its Max-SNP-hardness, but the weak proved values of them reflect the potential strength of the greedy approximation techniques. The strength of the greedy methods for the SSP is enhanced also by the asymptotic behaviour and the smoothed analysis of the problem in random and real-world instances, respectively. All these issues are presented in this chapter in a concise way covering the whole relevant literature, revealing the knowledge that is already conquered, and paving the path for further development in the study of shortest superstrings. The order of the sections highlights the pass from hardness complexity results for the SSP to efficient algorithms for the problem based on greedy strategies, and to theoretical results that establish the strength of the greedy techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alexander, K.S.: Shortest common superstrings for strings of random letters. In: Crochemore, M., Gusfield, D. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 807, pp. 164–172. Springer, Berlin (1994)
Armen, C., Stein, C.: Improved length bounds for the shortest superstring problem. In: Akl, S., Dehne, F., Sack, J.R., Santoro, N. (eds.) Algorithms and Data Structures. Lecture Notes in Computer Science, vol. 955, pp. 494–505. Springer, Berlin (1995)
Armen, C., Stein, C.: Short superstrings and the structure of overlapping strings. J. Comput. Biol. 2(2), 307–332 (1995)
Armen, C., Stein, C.: A \(2\frac{2} {3}\)-approximation algorithm for the shortest superstring problem. In: Hirschberg, D., Myers, G. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 1075, pp. 87–101. Springer, Berlin (1996)
Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and the hardness of approximation problems. J. ACM 45(3), 501–555 (1998)
Berger, B., Rompel, J., Shor, P.W.: Efficient NC algorithms for set cover with applications to learning and geometry. J. Comput. Syst. Sci. 49(3), 454–477 (1994)
Bläser, M.: An 8/13-approximation algorithm for the asymmetric maximum TSP. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’02), pp. 64–73. Society for Industrial and Applied Mathematics, Philadelphia (2002)
Blum, A., Jiang, T., Li, M., Tromp, J., Yannakakis, M.: Linear approximation of shortest superstrings. J. ACM 41, 630–647 (1994)
Breslauer, D., Jiang, T., Jiang, Z.: Rotations of periodic strings and short superstrings. J. Algorithm. 24, 340–353 (1997)
Chirico, N., Vianelli, A., Belshaw, R.: Why genes overlap in viruses. Proc. R. Soc. B. Biol. Sci. 277(1701), 3809–3817 (2010)
Czumaj, A., Ga̧sieniec, L., Piotrów, M., Rytter, W.: Sequential and parallel approximation of shortest superstrings. J. Algorithm. 23, 74–100 (1997)
Festa, P., Resende, M.: GRASP: An annotated bibliography. In: Ribeiro, C., Hansen, P. (eds.) Essays and Surveys in Metaheuristics. Operations Research/Computer Science, pp. 325–367. Kluwer Academic, Dordecht (2002)
Frieze, A., Szpankowski, W.: Greedy algorithms for the shortest common superstring that are asymptotically optimal. Algorithmica 21, 21–36 (1998)
Gallant, J.K.: String compression algorithms. Ph.D. thesis, Princeton (1982)
Gallant, J., Maier, D., Storer, J.A.: On finding minimal length superstrings. J. Comput. Syst. Sci. 20(1), 50–58 (1980)
Gerver, M.: Three-valued numbers and digraphs. Kvant 1987(2), 32–35 (1987)
Gevezes, T., Pitsoulis, L.: A greedy randomized adaptive search procedure with path relinking for the shortest superstring problem. J. Comb. Optim. (2013) doi: 10.1007/s10878-013-9622-z
Gingeras, T., Milazzo, J., Sciaky, D., Roberts, R.: Computer programs for the assembly of DNA sequences. Nucleic Acids Res. 7(2), 529–543 (1979)
Glover, F., Laguna, M.: Tabu Search. Kluwer Academic, Norwell (1997)
Goldberg, M.K., Lim, D.T.: A learning algorithm for the shortest superstring problem. In: Proceedings of the Atlantic Symposium on Computational Biology and Genome Information and Technology, pp. 171–175 (2001)
Goldberg, D., Deb, K., Korb, B.: Messy genetic algorithms: Motivation, analysis, and first results. Complex Syst. 3, 493–530 (1989)
Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor (1975)
Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40(9), 1098–1101 (1952)
Ilie, L., Popescu, C.: The shortest common superstring problem and viral genome compression. Fundam. Inform. 73, 153–164 (2006)
Ilie, L., Tinta, L., Popescu, C., Hill, K.A.: Viral genome compression. In: Mao, C., Yokomori, T. (eds.) DNA Computing. Lecture Notes in Computer Science, vol. 4287, pp. 111–126. Springer, Berlin (2006)
Jenkyns, T.A.: The greedy travelling salesman’s problem. Networks 9(4), 363–373 (1979)
Jiang, T., Li, M.: Approximating shortest superstrings with constraints. Theor. Comput. Sci. 134(2), 473–491 (1994)
Kaplan, H., Shafrir, N.: The greedy algorithm for shortest superstrings. Inf. Process. Lett. 93, 13–17 (2005)
Kaplan, H., Lewenstein, M., Shafrir, N., Sviridenko, M.: Approximation algorithms for asymmetric TSP by decomposing directed regular multigraphs. J. ACM 52, 602–626 (2005)
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972)
Kosaraju, S.R., Park, J.K., Stein, C.: Long tours and short superstrings. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pp. 166–177. IEEE Computer Society, Washington, DC (1994)
Laguna, M., Martí, R.: GRASP and path relinking for 2-layer straight line crossing minimization. INFORMS J. Comput. 11, 44–52 (1999)
Lesk, A.M.: Computational Molecular Biology. Sources and Methods for Sequence Analysis. Oxford University Press, Oxford (1988)
Li, M.: Towards a DNA Sequencing Theory (Learning a String), vol. 1, pp. 125–134. IEEE Computer Society, Los Alamitos (1990)
López-Rodríguez, D., Mérida-Casermeiro, E.: Shortest common superstring problem with discrete neural networks. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) Adaptive and Natural Computing Algorithms. Lecture Notes in Computer Science, vol. 5495, pp. 62–71. Springer, Berlin (2009)
Ma, B.: Why greed works for shortest common superstring problem. In: Ferragina, P., Landau, G. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 5029, pp. 244–254. Springer, Berlin (2008)
Maier, D., Storer, J.A.: A note on the complexity of the superstring problem. Technical Report 233, Computer Science Laboratory, Princeton University, Princeton (1977)
Middendorf, M.: More on the complexity of common superstring and supersequence problems. Theor. Comput. Sci. 125(2), 205–228 (1994)
Middendorf, M.: Shortest common superstrings and scheduling with coordinated starting times. Theor. Comput. Sci. 191(1–2), 205–214 (1998)
Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulation of traveling salesman problems. J. ACM 7, 326–329 (1960)
Ott, S.: Lower bounds for approximating shortest superstrings over an alphabet of size 2. In: Widmayer, P., Neyer, G., Eidenbenz, S. (eds.) Graph-Theoretic Concepts in Computer Science. Lecture Notes in Computer Science, vol. 1665, pp. 55–64. Springer, Berlin (1999)
Papadimitriou, C.H., Steiglitz, K.: Combinatorial optimization: algorithms and complexity. Prentice-Hall, Englewood Cliffs (1982)
Papadimitriou, C.H., Yannakakis, M.: Optimization, approximation, and complexity classes. J. Comput. Syst. Sci. 43(3), 425–440 (1991)
Papadimitriou, C.H., Yannakakis, M.: The traveling salesman problem with distances one and two. Math. Oper. Res. 18(1), 1–11 (1993)
Peltola, H., Söderlund, H., Ukkonen, E.: SEQAID: a DNA sequence assembling program based on a mathematical model. Nucleic Acids Res. 12(1), 307–321 (1984)
Pevzner, P.A., Waterman, M.S.: Open Combinatorial Problems in Computational Molecular Biology, p. 158. IEEE Computer Society, Los Alamitos (1995)
Pitsoulis, L., Resende, M.: Greedy randomized adaptive search procedures. In: Pardalos, P., Resende, M. (eds.) Handbook of Applied Optimization, pp. 178–183. Oxford University Press, Oxford (2002)
Plociennik, K.: A probabilistic PTAS for shortest common superstring. In: Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science 2009 (MFCS ’09), pp. 624–635. Springer, Berlin (2009)
Reif, J.H.: Synthesis of Parallel Algorithms, 1st edn. Morgan Kaufmann, San Francisco (1993)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)
Shapiro, M.B.: An algorithm for reconstructing protein and RNA sequences. J. ACM 14, 720–731 (1967)
Spielman, D., Teng, S.H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. In: Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing (STOC ’01), pp. 296–305. ACM, New York (2001)
Staden, R.: Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. Nucleic Acids Res. 10(15), 4731–4751 (1982)
Storer, J.A.: Data compression: Methods and theory. Computer Science Press, New York (1988)
Storer, J.A., Szymanski, T.G.: The macro model for data compression (extended abstract). In: Proceedings of the Tenth Annual ACM Symposium on Theory of Computing (STOC ’78), pp. 30–39. ACM, New York (1978)
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29, 928–951 (1982)
Sweedyk, Z.: A \(2\frac{1} {2}\)-approximation algorithm for shortest superstring. SIAM J. Comput. 29, 954–986 (1999)
Tarhio, J., Ukkonen, E.: A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57(1), 131–145 (1988)
Tarjan, R.E.: Data Structures and Network Algorithms. Society for Industrial and Applied Mathematics, Philadelphia (1983)
Teng, S.H., Yao, F.: Approximating Shortest Superstrings, pp. 158–165. IEEE Computer Society, Los Alamitos (1993)
Timkovskii, V.G.: Complexity of common subsequence and supersequence problems and related problems. Cybern. Syst. Anal. 25, 565–580 (1989)
Turner, J.S.: Approximation algorithms for the shortest common superstring problem. Inf. Comput. 83, 1–20 (1989)
Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Vassilevska, V.: Explicit inapproximability bounds for the shortest superstring problem. In: Jedrzejowicz, J., Szepietowski, A. (eds.) Mathematical Foundations of Computer Science 2005. Lecture Notes in Computer Science, vol. 3618, pp. 793–800. Springer, Berlin (2005)
Yang, E., Zhang, Z.: The shortest common superstring problem: Average case analysis for both exact and approximate matching. IEEE Trans. Inf. Theory 45(6), 1867–1886 (1999)
Zaritsky, A., Sipper, M.: Coevolving solutions to the shortest common superstring problem. Biosystems 76(1–3), 209–216 (2004)
Zaritsky, A., Sipper, M.: The preservation of favored building blocks in the struggle for fitness: The puzzle algorithm. Evol. Comput. 8(5), 443–455 (2004)
Acknowledgements
This research has been funded by the European Union (European Social Fund—ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)—Research Funding Program: Thalis. Investing in knowledge society through the European Social Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Gevezes, T.P., Pitsoulis, L.S. (2014). The Shortest Superstring Problem. In: Rassias, T., Floudas, C., Butenko, S. (eds) Optimization in Science and Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0808-0_10
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0808-0_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0807-3
Online ISBN: 978-1-4939-0808-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)