Abstract
We revisit the exact shortest unique substring (SUS) finding problem, and propose its approximate version where mismatches are allowed, due to its applications in subfields such as computational biology. We design a generic in-place framework that fits to solve both the exact and approximate k-mismatch SUS finding, using the minimum 2n memory words plus n bytes space, where n is the input string size. By using the in-place framework, we can find the exact and approximate k-mismatch SUS for every string position using a total of O(n) and \(O(n^2)\) time, respectively, regardless of the value of k. Our framework does not involve any compressed or succinct data structures and thus is practical and easy to implement.
Authors are listed in alphabetical order. Due to the page limit, a run example for Sect. 5 and all missing proofs and pseudocode can be found in the arXiv version of this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
It is our arbitrary decision to resolve the ties by picking the rightmost choice. Our solution can also be easily modified to find the leftmost choice.
- 2.
- 3.
In actual run, \(t_i^{-1}\) saves the largest number in that list, as we will see more clearly later.
- 4.
For each \(j' < j\), if \(I_{j'}\) covers i, \(I_j\) would also cover i; in such a case, \(B[j] + 1 \ge B[j'] + 1\). For each \(j' \in [{\texttt {pred}}[i], i-1]\), \(I_{j'}\) is longer than \(I_i\).
- 5.
Intuitively, \({\texttt {pred}}\) defines the shortcuts so that we can skip some intervals in \(I_{<i}\) to compute \(t_i\).
References
Adaş, B., Bayraktar, E., Faro, S., Moustafa, I.E., Külekci, M.O.: Nucleotide sequence alignment and compression via shortest unique substring. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015, Part II. LNCS, vol. 9044, pp. 363–374. Springer, Heidelberg (2015)
Derrien, T., Estell, J., Marco Sola, S., Knowles, D.G., Raineri, E., Guig, R., Ribeca, P.: Fast computation and applications of genome mappability. PLoS ONE 7(1), e30377 (2012)
Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinform. 6, 123 (2005)
Hu, X., Pei, J., Tao, Y.: Shortest Unique Queries on Strings. In: Moura, E., Crochemore, M. (eds.) SPIRE 2014. LNCS, vol. 8799, pp. 161–172. Springer, Heidelberg (2014)
İleri, A.M., Külekci, M.O., Xu, B.: A simple yet time-optimal and linear-space algorithm for shortest unique substring queries. Theor. Comput. Sci. 562, 621–633 (2015). Also in CPM 2014
Nong, G.: Practical linear-time \(O(1)\)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. (TOIS) 31(3), 15:1–15:15 (2013)
Pei, J., Wu, W.C.H., Yeh, M.Y.: On shortest unique substring queries. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 937–948 (2013)
Tsuruta, K., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substrings queries in optimal time. In: Geffert, V., Preneel, B., Rovan, B., Štuller, J., Tjoa, A.M. (eds.) SOFSEM 2014. LNCS, vol. 8327, pp. 503–513. Springer, Heidelberg (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hon, WK., Thankachan, S.V., Xu, B. (2015). An In-place Framework for Exact and Approximate Shortest Unique Substring Queries. In: Elbassioni, K., Makino, K. (eds) Algorithms and Computation. ISAAC 2015. Lecture Notes in Computer Science(), vol 9472. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48971-0_63
Download citation
DOI: https://doi.org/10.1007/978-3-662-48971-0_63
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48970-3
Online ISBN: 978-3-662-48971-0
eBook Packages: Computer ScienceComputer Science (R0)