An In-place Framework for Exact and Approximate Shortest Unique Substring Queries

Wing-Kai Hon¹⁵,
Sharma V. Thankachan¹⁶ &
Bojian Xu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9472))

Included in the following conference series:

International Symposium on Algorithms and Computation

1174 Accesses
5 Citations

Abstract

We revisit the exact shortest unique substring (SUS) finding problem, and propose its approximate version where mismatches are allowed, due to its applications in subfields such as computational biology. We design a generic in-place framework that fits to solve both the exact and approximate k-mismatch SUS finding, using the minimum 2n memory words plus n bytes space, where n is the input string size. By using the in-place framework, we can find the exact and approximate k-mismatch SUS for every string position using a total of O(n) and $O(n^2)$ time, respectively, regardless of the value of k. Our framework does not involve any compressed or succinct data structures and thus is practical and easy to implement.

Authors are listed in alphabetical order. Due to the page limit, a run example for Sect. 5 and all missing proofs and pseudocode can be found in the arXiv version of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Shortest Unique Substring Query Revisited

Space-Efficient Approximate String Matching Allowing Inversions in Fast Average Time

Shortest Unique Substrings Queries in Optimal Time

Notes

1.
It is our arbitrary decision to resolve the ties by picking the rightmost choice. Our solution can also be easily modified to find the leftmost choice.
2.
Note that the work of [4] studies a different problem and its computation is of the query-answer model, and thus is not comparable with [5, 7, 8] and ours.
3.
In actual run, $t_i^{-1}$ saves the largest number in that list, as we will see more clearly later.
4.
For each $j' < j$, if $I_{j'}$ covers i, $I_j$ would also cover i; in such a case, $B[j] + 1 \ge B[j'] + 1$. For each $j' \in [{\texttt {pred}}[i], i-1]$, $I_{j'}$ is longer than $I_i$.
5.
Intuitively, ${\texttt {pred}}$ defines the shortcuts so that we can skip some intervals in $I_{<i}$ to compute $t_i$.

References

Adaş, B., Bayraktar, E., Faro, S., Moustafa, I.E., Külekci, M.O.: Nucleotide sequence alignment and compression via shortest unique substring. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015, Part II. LNCS, vol. 9044, pp. 363–374. Springer, Heidelberg (2015)
Google Scholar
Derrien, T., Estell, J., Marco Sola, S., Knowles, D.G., Raineri, E., Guig, R., Ribeca, P.: Fast computation and applications of genome mappability. PLoS ONE 7(1), e30377 (2012)
Article Google Scholar
Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinform. 6, 123 (2005)
Article Google Scholar
Hu, X., Pei, J., Tao, Y.: Shortest Unique Queries on Strings. In: Moura, E., Crochemore, M. (eds.) SPIRE 2014. LNCS, vol. 8799, pp. 161–172. Springer, Heidelberg (2014)
Google Scholar
İleri, A.M., Külekci, M.O., Xu, B.: A simple yet time-optimal and linear-space algorithm for shortest unique substring queries. Theor. Comput. Sci. 562, 621–633 (2015). Also in CPM 2014
Article MathSciNet MATH Google Scholar
Nong, G.: Practical linear-time $O(1)$-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. (TOIS) 31(3), 15:1–15:15 (2013)
Article MathSciNet Google Scholar
Pei, J., Wu, W.C.H., Yeh, M.Y.: On shortest unique substring queries. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 937–948 (2013)
Google Scholar
Tsuruta, K., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substrings queries in optimal time. In: Geffert, V., Preneel, B., Rovan, B., Štuller, J., Tjoa, A.M. (eds.) SOFSEM 2014. LNCS, vol. 8327, pp. 503–513. Springer, Heidelberg (2014)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
Wing-Kai Hon
School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, USA
Sharma V. Thankachan
Department of Computer Science, Eastern Washington University, Cheney, USA
Bojian Xu

Authors

Wing-Kai Hon
View author publications
You can also search for this author in PubMed Google Scholar
Sharma V. Thankachan
View author publications
You can also search for this author in PubMed Google Scholar
Bojian Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bojian Xu .

Editor information

Editors and Affiliations

Masdar Institute, Abu Dhabi, United Arab Emirates
Khaled Elbassioni
Kyoto University, Kyoto, Japan
Kazuhisa Makino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hon, WK., Thankachan, S.V., Xu, B. (2015). An In-place Framework for Exact and Approximate Shortest Unique Substring Queries. In: Elbassioni, K., Makino, K. (eds) Algorithms and Computation. ISAAC 2015. Lecture Notes in Computer Science(), vol 9472. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48971-0_63

Download citation

DOI: https://doi.org/10.1007/978-3-662-48971-0_63
Published: 27 November 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48970-3
Online ISBN: 978-3-662-48971-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An In-place Framework for Exact and Approximate Shortest Unique Substring Queries

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Shortest Unique Substring Query Revisited

Space-Efficient Approximate String Matching Allowing Inversions in Fast Average Time

Shortest Unique Substrings Queries in Optimal Time

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An In-place Framework for Exact and Approximate Shortest Unique Substring Queries

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Shortest Unique Substring Query Revisited

Space-Efficient Approximate String Matching Allowing Inversions in Fast Average Time

Shortest Unique Substrings Queries in Optimal Time

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation