Abstract
The development of Information Retrieval (IR) techniques heavily depends on empirical studies over real world data collections. Unfortunately, those real world data sets are often unavailable to researchers due to privacy concerns. In fact, the lack of publicly available industry data sets has become a serious bottleneck hindering IR research. To address this problem, we propose to bridge the gap between academic research and industry data sets through a privacy-preserving evaluation platform. The novelty of the platform lies in its “data-centric” mechanism, where the data sit on a secure server and IR algorithms to be evaluated would be uploaded to the server. The platform will run the codes of the algorithms and return the evaluation results. Preliminary experiments with retrieval models reveal interesting new observations and insights about state of the art retrieval models, demonstrating the value of an industry data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chor, B., Kushilevitz, E., Goldreich, O., Sudan, M.: Private information retrieval. J. ACM 45(6), 965–981 (1998)
Fang, H., Tao, T., Zhai, C.: Diagnostic evaluation of information retrieval models. ACM Trans. Inf. Syst. 29(2), 7–42 (2011). http://doi.acm.org/10.1145/1961209.1961210
Fang, H., Wu, H., Yang, P., Zhai, C.: Virlab: a web-based virtual lab for learning and studying information retrieval models. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1249–1250. SIGIR 2014, NY (2014). http://doi.acm.org/10.1145/2600428.2611178
Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of the SIGIR 2005 (2005)
Fang, H., Zhai, C.: Virlab: a platform for privacy-preserving evaluation for information retrieval models. In: Proceeding of the 1st International Workshop on Privacy-Preserving IR (2014)
Hopfgartner, F., Hanbury, A., Müller, H., Kando, N., Mercer, S., Kalpathy-Cramer, J., Potthast, M., Gollub, T., Krithara, A., Lin, J., Balog, K., Eggel, I.: Report on the evaluation-as-a-service (eaas) expert workshop. SIGIR Forum 49(1), 57–65 (2015). http://doi.acm.org/10.1145/2795403.2795416
Lin, J., Efron, M.: Evaluation as a service for information retrieval. SIGIR Forum 47(2), 8–14 (2013). http://doi.acm.org/10.1145/2568388.2568390
Paik, J.H., Lin, J.: Retrievability in api-based “evaluation as a service”. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 91–94. ICTIR 2016, NY (2016). http://doi.acm.org/10.1145/2970398.2970427
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: Proceedings of TREC (1996)
Si, L., Yang, H.: Privacy-preserving ir: when information retrieval meets privacy and security. In: Proceedings of the SIGIR 2014 (2014)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the SIGIR 1996 (1996)
Wang, L., Lin, J., Metzler, D.: Learning to efficiently rank. In: Proceedings of SIGIR 2010 (2010)
Yang, P., Fang, H.: A reproducibility study of information retrieval models. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 77–86. ICTIR 2016, NY (2016). http://doi.acm.org/10.1145/2970398.2970415
Acknowledgments
This research was supported by the U.S. National Science Foundation under IIS-1423002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yang, P., Zhou, M., Chang, Y., Zhai, C., Fang, H. (2017). Towards Privacy-Preserving Evaluation for Information Retrieval Models Over Industry Data Sets. In: Sung, WK., et al. Information Retrieval Technology. AIRS 2017. Lecture Notes in Computer Science(), vol 10648. Springer, Cham. https://doi.org/10.1007/978-3-319-70145-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-70145-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70144-8
Online ISBN: 978-3-319-70145-5
eBook Packages: Computer ScienceComputer Science (R0)