Abstract
In-memory hash tables provide fast access to large numbers of strings, with less space overhead than sorted structures such as tries and binary trees. If chains are used for collision resolution, hash tables scale well, particularly if the pattern of access to the stored strings is skew. However, typical implementations of string hash tables, with lists of nodes, are not cache-efficient. In this paper we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. In all cases, the new structures give substantial savings in space at no cost in time. In the best case, the overhead space required for pointers is reduced by a factor of around 50, to less than two bits per string (with total space required, including 5.68 megabytes of strings, falling from 20.42 megabytes to 5.81 megabytes), while access times are also reduced.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)
Baer, J., Chen, T.: Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers 44(5), 609–623 (1995)
Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proc. ASPLOS Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 40–52. ACM Press, New York (1991)
Chilimbi, T.M., Hill, M.D., Larus, J.R.: Cache-conscious structure layout. In: Proc. ACM SIGPLAN conf. on Programming Language Design and Implementation, pp. 1–12. ACM Press, New York (1999)
Collins, J., Sair, S., Calder, B., Tullsen, D.M.: Pointer cache assisted prefetching. In: Proc. Annual ACM/IEEE MICRO Int. Symp. on Microarchitecture, pp. 62–73. IEEE Computer Society Press, Los Alamitos (2002)
Fu, J.W.C., Patel, J.H., Janssens, B.L.: Stride directed prefetching in scalar processors. SIGMICRO Newsletter 23(1-2), 102–110 (1992)
Halatsis, C., Philokyprou, G.: Pseudochaining in hash tables. Communications of the ACM 21(7), 554–557 (1978)
Harman, D.: Overview of the second text retrieval conference (TREC-2). In: Information Processing & Management, pp. 271–289. Pergamon Press, Inc., Oxford (1995)
Heileman, G.L., Luo, W.: How caching affects hashing. In: Proc. ALENEX Workshop on Algorithm Engineering and Experiments (January 2005)
Heinz, S., Zobel, J., Williams, H.E.: Self-adjusting trees in practice for large text collections. Software—Practice and Experience 31(10), 925–939 (2001)
Karlsson, M., Dahlgren, F., Stenstrom, P.: A prefetching technique for irregular accesses to linked data structures. In: Proc. Symp. on High-Performance Computer Architecture, January 2000, pp. 206–217 (2000)
Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, 2nd edn., vol. 3. Addison-Wesley Longman, Amsterdam (1998)
Larson, P.: Performance analysis of linear hashing with partial expansions. ACM Transactions on Database Systems 7(4), 566–587 (1982)
Lebeck, A.R.: Cache conscious programming in undergraduate computer science. In: Proc. SIGCSE Technical Symp. on Computer Science Education, pp. 247–251. ACM Press, New York (1999)
McCabe, J.: On serial files with relocatable records. Operations Research 13, 609–618 (1965)
Munro, J.I., Celis, P.: Techniques for collision resolution in hash tables with open addressing. In: Proc. ACM Fall Joint Computer Conf., pp. 601–610. IEEE Computer Society Press, Los Alamitos (1986)
Peterson, W.W.: Open addressing. IBM J. Research & Development 1, 130–146 (1957)
Ramakrishna, M.V., Zobel, J.: Performance in practice of string hashing functions. In: Proc. DASFAA Symp. on Databases Systems for Advanced Applications, vol. 6, pp. 215–224. World Scientific, Singapore (1997)
Rathi, A., Lu, H., Hedrick, G.E.: Performance comparison of extendible hashing and linear hashing techniques. In: Proc. ACM SIGSMALL/PC Symp. on Small Systems, pp. 178–185. ACM Press, New York (1990)
Rosenberg, A.L., Stockmeyer, L.J.: Hashing schemes for extendible arrays. Jour. of the ACM 24(2), 199–221 (1977)
Roth, A., Sohi, G.S.: Effective jump-pointer prefetching for linked data structures. In: Proc. Int. Symp. on Computer Architecture, pp. 111–121. IEEE Computer Society Press, Los Alamitos (1999)
Sarwate, D.V.: A note on universal classes of hash functions. Information Processing Letters 10(1), 41–45 (1980)
Sinha, R., Ring, D., Zobel, J.: Cache-efficient string sorting using copying. In: submission
Sinha, R., Zobel, J.: Cache-conscious sorting of large sets of strings with dynamic tries. ACM Jour. of Exp. Algorithmics 9 (2005)
Vitter, J.S.: Analysis of the search performance of coalesced hashing. Jour. of the ACM 30(2), 231–258 (1983)
Yang, C., Lebeck, A.R., Tseng, H., Lee, C.: Tolerating memory latency through push prefetching for pointer-intensive applications. ACM Trans. Architecture Code Optimisation 1(4), 445–475 (2004)
Zobel, J., Williams, H.E., Heinz, S.: In-memory hash tables for accumulating text vocabularies. Information Processing Letters 80(6), 271–277 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Askitis, N., Zobel, J. (2005). Cache-Conscious Collision Resolution in String Hash Tables. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_11
Download citation
DOI: https://doi.org/10.1007/11575832_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29740-6
Online ISBN: 978-3-540-32241-2
eBook Packages: Computer ScienceComputer Science (R0)