[go: up one dir, main page]

skip to main content
10.1145/3308558.3313553acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

A Hybrid BitFunnel and Partitioned Elias-Fano Inverted Index

Published: 13 May 2019 Publication History

Abstract

Search engines encounter a time vs. space trade-off: search responsiveness (i.e., a short query response time) comes at the cost of increased index storage. We propose a hybrid method which uses both (a) the recently published mapping-matrix-style index BitFunnel (BF) for search efficiency, and (b) the state-of-the-art Partitioned Elias-Fano (PEF) inverted-index compression method. We use this proposed hybrid method to minimize time while satisfying a fixed space constraint, and to minimize space while satisfying a fixed time constraint. Each document is stored using either BF or PEF, and we use a local search strategy to find an approximately optimal BF-PEF partition. Since performing full experiments on each candidate BF-PEF partition is impractically slow, we use a regression model to predict the time and space costs resulting from candidate partitions (space accuracy 97.6%; time accuracy 95.2%). Compared with a hybrid mathematical index (Ottaviano et al., 2015), the time cost is reduced by up to 47% without significantly exceeding its size. Compared with three mathematical encoding methods, the hybrid BF-PEF index allows performing list intersection between around 16% to 76% faster (without significantly increasing the index size). Compared with BF, the index size is reduced by 45% while maintaining an intersection time comparable to that of BF.

References

[1]
Rakesh Agrawal, Behzad Golshan, and Evangelos E. Papalexakis. 2015. A Study of Distinctiveness in Web Results of Two Search Engines. In Proc. WWW. 267-273.
[2]
Vo Ngoc Anh and Alistair Moffat. 2005. Inverted Index Compression Using Word-Aligned Binary Codes. Inf. Retr. 8(2005), 151-166.
[3]
Vo Ngoc Anh and Alistair Moffat. 2010. Index Compression Using 64-bit Words. Softw., Pract. Exper. 40, 2 (2010), 131-147.
[4]
Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM 13, 7 (1970), 422-426.
[5]
Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5-32.
[6]
Stefan Büttcher, Charles L. A. Clarke, and Gordon V. Cormack. 2010. Information Retrieval-Implementing and Evaluating Search Engines. MIT Press.
[7]
Berkant Barla Cambazoglu and Ricardo A. Baeza-Yates. 2015. Scalability Challenges in Web Search Engines. Morgan & Claypool Publishers.
[8]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms(3 ed.). MIT Press.
[9]
J. Shane Culpepper and Alistair Moffat. 2010. Efficient set intersection for inverted indexing. ACM Trans. Inf. Syst. 29, 1 (2010), 1:1-1:25.
[10]
Jay L. Devore. 2011. Probability and Statistics for Engineering and the Sciences (8 ed.). Cengage Learning.
[11]
Harris Drucker, Christopher J. C. Burges, Linda Kaufman, Alexander J. Smola, and Vladimir Vapnik. 1996. Support Vector Regression Machines. In Advances in Neural Information Processing Systems. 155-161.
[12]
Peter Elias. 1974. Efficient Storage and Retrieval by Content and Address of Static Files. J. ACM 21, 2 (1974), 246-260.
[13]
Robert Mario Fano. 1971. On the Number of Bits Required to Implement an Associative Memory. Massachusetts Institute of Technology, Project MAC.
[14]
Jerome H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 29, 5 (2001), 1189-1232.
[15]
Bob Goodwin, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, and Yuxiong He. 2017. BitFunnel: Revisiting Signatures for Search. In Proc. SIGIR. 605-614.
[16]
Andrew Kane and Frank Wm. Tompa. 2014. Skewed Partial Bitvectors for List Intersection. In Proc. SIGIR. 263-272.
[17]
Daniel Lemire, Owen Kaser, and Kamel Aouiche. 2010. Sorting Improves Word-aligned Bitmap Indexes. Data Knowl. Eng. 69, 1 (2010), 3-28.
[18]
Xiaozhu Liu and Zhiyong Peng. 2010. An Efficient Random Access Inverted Index for Information Retrieval. In Proc. WWW. 1153-1154.
[19]
Xinyu Liu, Zhaohua Zhang, Boran Hou, Rebecca J. Stones, Gang Wang, and Xiaoguang Liu. 2018. Index Compression for BitFunnel Query Processing. In Proc SIGIR. 921-924.
[20]
Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri. 2007. Mining query logs to optimize index partitioning in parallel web search engines. In Proceedings of the 2nd International Conference on Scalable Information Systems. 43-51.
[21]
Alistair Moffat and Lang Stuiver. 2000. Binary Interpolative Coding for Effective Index Compression. Inf. Retr. 3, 1 (2000), 25-47.
[22]
Giuseppe Ottaviano, Nicola Tonellotto, and Rossano Venturini. 2015. Optimal Space-time Tradeoffs for Inverted Indexes. In Proc. WSDM. 47-56.
[23]
Giuseppe Ottaviano and Rossano Venturini. 2014. Partitioned Elias-Fano Indexes. In Proc. SIGIR. 273-282.
[24]
Knut Magne Risvik, Trishul M. Chilimbi, Henry Tan, Karthik Kalyanaraman, and Chris Anderson. 2013. Maguro, a System for Indexing and Searching over Very Large Text Collections. In Proc. WSDM. 727-736.
[25]
David Salomon. 2007. Variable-length Codes for Data Compression. Springer London.
[26]
Prabhakant Sinha and Andris A. Zoltners. 1979. The Multiple-Choice Knapsack Problem. Operations Research 27, 3 (1979), 503-515.
[27]
Alexander A. Stepanov, Anil R. Gangolli, Daniel E. Rose, Ryan J. Ernst, and Paramjit S. Oberoi. 2011. SIMD-based Decoding of Posting Lists. In Proc. CIKM. 317-326.
[28]
Xiujun Wang, Yusheng Ji, Zhe Dang, Xiao Zheng, and Baohua Zhao. 2015. Improved Weighted Bloom Filter and Space Lower Bound Analysis of Algorithms for Approximated Membership Querying. In Proc. DASFAA. 346-362.
[29]
Hao Yan, Shuai Ding, and Torsten Suel. 2009. Inverted Index Compression and Query Processing with Optimized Document Ordering. In Proc. WWW. 401-410.
[30]
Marcin Zukowski, Sándor He´man, Niels Nes, and Peter A. Boncz. 2006. Super-Scalar RAM-CPU Cache Compression. In Proc. ICDE.

Cited By

View all
  • (2024)A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queriesMicroprocessors and Microsystems10.1016/j.micpro.2023.104992105(104992)Online publication date: Mar-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BitFunnel
  2. Partitioned Elias-Fano
  3. compression
  4. hybrid index
  5. query processing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queriesMicroprocessors and Microsystems10.1016/j.micpro.2023.104992105(104992)Online publication date: Mar-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media