Learning hash index based on a shallow autoencoder

382 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The hash index plays an important role in improving query efficiency in databases. Because traditional hash algorithms cannot use the original data distribution, there is often a high collision rate in large-scale datasets. Additionally, some traditional hashing algorithms are data-dependent and cannot be accelerated in parallel. The learned index provides a new method for index design in data management systems. The key idea of the learned index is to consider the index as a model that can be learned. In this paper, we propose a learning hash algorithm based on a shallow autoencoder that can make full use of the original data characteristics and take advantage of the parallelism of matrix operations. Therefore, compared with traditional hash functions, the proposed method has a lower collision rate and higher efficiency. Finally, we verify the effectiveness of the proposed method through a series of experiments on synthetic datasets and real datasets. Experimental results show that the proposed hash algorithm has considerable advantages in reducing the collision rate and computing time while improving the retrieval efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Binary Variational Autoencoder for Hashing

Autoencoder-based unsupervised clustering and hashing

Article 19 August 2020

Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

References

Dittrich J, Nix J, Schon̈ C (2021) The next 50 years in database indexing or: the case for automatically generated index structures. Proc VLDB Endow 15(3):527–540. https://doi.org/10.14778/3494124.3494136
Article Google Scholar
Hu D, Chen Z, Wu J, Sun J, Chen H (2021) Persistent memory hash indexes: an experimental evaluation. Proc VLDB Endow 14(5):785–798. https://doi.org/10.14778/3446095.3446101
Article Google Scholar
Pagh R, Rodler FF (2004) Cuckoo hashing. J Algorithms 51(2):122–144. https://doi.org/10.1016/j.jalgor.2003.12.002
Article MathSciNet MATH Google Scholar
Dietzfelbinger M, Karlin A, Mehlhorn K, Meyer auf der Heide F, Rohnert H, Tarjan RE (1994) Dynamic perfect hashing: upper and lower bounds. SIAM J Comput 23(4):738–761. https://doi.org/10.1137/S0097539791194094
Article MathSciNet MATH Google Scholar
Richter S, Alvarez V, Dittrich J (2015) A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB 9(3):96–107. https://doi.org/10.14778/2850583.2850585
Article Google Scholar
Rivest RL (1992) The MD5 message-digest algorithm. RFC 1321:1–21. https://doi.org/10.17487/RFC1321
Article Google Scholar
GelB P, Klus S, Schuster I, Schütte C (2021) Feature space approximation for kernel-based supervised learning. Knowl-Based Syst 221:106935. https://doi.org/10.1016/j.knosys.2021.106935
Article Google Scholar
Zheng Z, Zheng L, Yang Y (2018) A discriminatively learned cnn embedding for person reidentification. ACM Trans Multimedia Comput Commun Appl 14(1):13–11320. https://doi.org/10.1145/3159171
Article MathSciNet Google Scholar
Zhao F, Huang Y, Wang L, Tan T (2015) Deep semantic ranking based hashing for multi-label image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1556–1564
Cao Y, Qi H, Gui J, Li S, Li K (2019) General distributed hash learning on image descriptors for k-nearest neighbor search. IEEE Signal Process Lett 26(5):750–754. https://doi.org/10.1109/LSP.2019.2907777
Article Google Scholar
Zhang B, Qian J, Xie X, Xin Y, Dong Y (2021) Capsnet-based supervised hashing. Appl Intell 51(8):5912–5926. https://doi.org/10.1007/s10489-020-02180-7
Article Google Scholar
Xu X, Lai Z-H, Chen Yd (2020) Relaxed locality preserving supervised discrete hashing. IEEE Trans Big Data:1–1. https://doi.org/10.1109/TBDATA.2020.3027379
Kraska T, Beutel A, Chi EH, Dean J, Polyzotis N (2018) The case for learned index structures. In: Proceedings of the 2018 international conference on management of data, pp 489–504. https://doi.org/10.1145/3183713.3196909
Kipf A, Marcus R, van Renen A, Stoian M, Kemper A, Kraska T, Neumann T (2020) Radixspline: a single-pass learned index. In: Proceedings of the third international workshop on exploiting artificial intelligence techniques for data management, pp 1–5. https://doi.org/10.1145/3401071.3401659
Ferragina P, Vinciguerra G (2020) The pgm-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc VLDB Endow 13(8):1162–1175. https://doi.org/10.14778/3389133.3389135
Article Google Scholar
Tang C, Wang Y, Dong Z, Hu G, Wang Z, Wang M, Chen H (2020) Xindex: a scalable learned index for multicore data storage. In: Proceedings of the 25th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 308–320. https://doi.org/10.1145/3332466.3374547
Galakatos A, Markovitch M, Binnig C, Fonseca R, Kraska T (2019) Fiting-tree: a data-aware index structure. In: Proceedings of the 2019 international conference on management of data, pp 1189–1206. https://doi.org/10.1145/3299869.3319860
Ding J, Minhas UF, Yu J, Wang C, Do J, Li Y, Zhang H, Chandramouli B, Gehrke J, Kossmann D., Lomet D, Kraska T (2020) Alex: an updatable adaptive learned index. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp 969–984. https://doi.org/10.1145/3318464.3389711
Li P, Hua Y, Zuo P, Jia J (2019) A scalable learned index scheme in storage systems. arXiv:1905.06256
Li X, Li J, Wang X (2019) Aslm: adaptive single layer model for learned index. In: DASFAA 2019 international workshops: BDMS, BDQM, and GDMA, pp 80–95. https://doi.org/10.1007/978-3-030-18590-9_6
Wu J, Zhang Y, Chen S, Wang J, Chen Y, Xing C (2021) Updatable learned index with precise positions. Proc VLDB Endow 14(8):1276–1288. https://doi.org/10.14778/3457390.3457393
Article Google Scholar
Lu B, Ding J, Lo E, Minhas UF, Wang T (2021) Apex: a high-performance learned index on persistent memory. Proc VLDB Endow 15(3):597–610. https://doi.org/10.14778/3494124.3494141
Article Google Scholar
Vitter JS, Chen W-C (1987) The Design and Analysis of Coalesced Hashing. Oxford University Press Inc., Oxford
Google Scholar
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual acm symposium on theory of computing, pp 604–613. https://doi.org/10.1145/276698.276876
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380. https://doi.org/10.1109/34.895972
Article Google Scholar
Guo J, Zhang S, Li J (2016) Hash learning with convolutional neural networks for semantic based image retrieval. In: Advances in knowledge discovery and data mining, pp 227–238. https://doi.org/10.1007/978-3-319-31753-3_19
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of the twenty-second annual conference on neural information processing systems (NIPS 2008), pp 1753–1760
Song J, He T, Gao L, Xu X, Hanjalic A, Shen HT (2018) Binary generative adversarial networks for image retrieval. In: Proceedings of the AAAI conference on artificial intelligence, pp 394–401
Chafik S, El Yacoubi MA, Daoudi I, El Ouardi H (2019) Unsupervised deep neuron-per-neuron hashing. Appl Intell 49(6):2218–2232. https://doi.org/10.1007/s10489-018-1353-5
Article Google Scholar
Wang J, Xu S, Zheng F, Lu K, Song J, Shao L (2021) Learning efficient hash codes for fast graph-based data similarity retrieval. IEEE Trans Image Process 30:6321–6334. https://doi.org/10.1109/TIP.2021.3093387
Article MathSciNet Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: ICML ’08
Reichenbach L (2018) Learned index structures: an evalution of their performance in key-value storage solutions. PhD thesis, Universität Hamburg
Ni J, Li J, McAuley J (2019) Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 188–197. https://doi.org/10.18653/v1/D19-1018
Sedgewick R (2001) Part 3: searching algorithms. Algorithms in C, pp 231–234. Pearson Education, Boston
Google Scholar
Aho AV, Lam MS, Sethi R, Ullman JD (2007) Compilers: principles, techniques, and tools, Pearson Education, India, Bengaluru
Ritchie DM, Kernighan BW, Lesk ME (1988) The C programming language. Prentice Hall, Englewood Cliffs
Google Scholar
Preiss BR (1999) Data structure and algorithms: with object-oriented design patterns in Java. Wiley, Hoboken
Google Scholar
Fowler G, Noll LC, Vo K-P, Eastlake DE, Hansen T (2022) The FNV non-cryptographic hash algorithm. Internet-draft, Internet Engineering Task Force. Work in Progress. https://datatracker.ietf.org/doc/draft-eastlake-fnv/18/

Download references

Acknowledgements

We thank the editor and anonymous reviewers for their valuable comments and feedbacks. This work was supported by Guangxi Natural Science Foundations(No. 2018GXNSFDA281049), Science and Technology Major Project of Guangxi Province (No. AA19046004), National Natural Science Foundation of China (Nos. 62062027, U1811264, 62167002 and 61862013), Guangxi Key Laboratory of Trusted Software (No. kx202203) and the Innovation Project of GUET Graduate Education (No. 2022YCXS079).

Author information

Authors and Affiliations

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Jinji Road, Guilin, 541004, Guangxi, China
Yuming Lin, Zhengguo Huang & You Li

Authors

Yuming Lin
View author publications
You can also search for this author in PubMed Google Scholar
Zhengguo Huang
View author publications
You can also search for this author in PubMed Google Scholar
You Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to You Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, Y., Huang, Z. & Li, Y. Learning hash index based on a shallow autoencoder. Appl Intell 53, 14999–15010 (2023). https://doi.org/10.1007/s10489-022-04274-w

Download citation

Accepted: 17 October 2022
Published: 07 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04274-w

Learning hash index based on a shallow autoencoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Binary Variational Autoencoder for Hashing

Autoencoder-based unsupervised clustering and hashing

Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Learning hash index based on a shallow autoencoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Binary Variational Autoencoder for Hashing

Autoencoder-based unsupervised clustering and hashing

Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now