Abstract
Understanding the users’ latent intents behind the search queries is critical for search engines. Hence, there has been an increasing attention on studying how to effectively mine the intents of search queries by analyzing search engine query log. However, we observe that the information richness of query log is not fully utilized so far and the information underuse heavily limits the performance of the existing methods. In this paper, we tackle the problem of query intent mining by taking full advantage of the information richness of query log from a multi-dimensional perspective. Specifically, we capture the latent relations between search queries via three different dimensions: the URL dimension, the session dimension and the term dimension. We first propose the Result-Oriented Framework (ROF), which is easy to implement and significantly improves both the precision and the recall of query intent mining. We further propose the Topic-Oriented Framework (TOF), in order to significantly reduce the online time and memory consumptions for query intent mining. TOF employs the Query Log Topic Model (QLTM) that derives the latent topics from query log to integrate the information of the three dimensions in a principled way. The latent topics that are considered as low-dimensional descriptions of the query relations and serve as the basis of efficient online query intent mining. We conduct extensive experiments on a major commercial search engine query log. Experimental results show that the two frameworks significantly outperform the state-of-the-art methods with respect to a variety of metrics.
Similar content being viewed by others
Notes
The work was done when the first author visiting Yahoo Labs
References
Arguello, J., Diaz, F., Callan, J., Crespo, J.F.: In: SIGIR (2009)
Baker, L., McCallum, A.: In: SIGIR (1998)
Beeferman, D., Berger, A.: In: SIGKDD (2000)
Blei, D., Ng, A., Jordan, M.: In: NIPS (2002)
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., Vigna, S.: In: CIKM (2008)
Broder, A: In SIGIR forum (2002)
Broder, A., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: In: SIGIR (2007)
Calderon-Benavides, L., Gonzalez-Caro, C., Baeza-Yates, R.: In: SIGIR Workshop (2010)
Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (ACM), pp. 875–883 (2008)
Carman, M., Crestani, F., Harvey, M., Baillie, M.: In CIKM (2010)
Celikyilmaz, A., Hakkani-Tur, D., Tur, G.: Leveraging web query logs to learn user intent via bayesian discrete latent variable model. ICML (2011)
Craswell, N., Szummer, M.: In: SIGIR (2007)
Dang, V., Xue, X., Croft, W.B.: In: CIKM (2011)
Deng, H., Lyu, M.R.: In: SIGKDD (2009)
Griffiths, T.L., Steyvers, M.: NAS (2004)
Han, J., Wang, J., Lu, Y., Tzvetkov, P.: In: ICDM (2002)
Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., Zheng, Q.: In: SIGIR (2012)
Jiang, D., Leung, K., Ng, W.: In: CIKM (2011)
Jiang, D., Vosecky, J., Leung, K.W.T., Ng, W.: G-WSTD: a framework for geographic web search topic discovery. In: CIKM (2012)
Jo, Y., Oh, A.H.: In: WSDM (2011)
Lee, U., Liu, Z., Cho, J.: Automatic identification of user goals in web search. In: Proceedings of the 14th international conference on World Wide Web, pp. 391–400, (ACM, 2005)
Li, X., Wang, Y.Y.: In: SIGIR (2008)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval (2008)
Pantel, P., Lin, T., Gamon, M.: In: ACL (2012)
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. ACL (1993)
Poblete, B., Castillo, C., Gionis, A.: In: CIKM (2008)
Qian, Y., Sakai, T., Ye, J., Zheng, Q., Li, C.: Dynamic query intent mining from a search log stream. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pp. 1205–1208, (ACM, 2013)
Radlinski, F., Szummer, M., Craswell, N.: In: WWW (2010)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: In: UAI (2004)
Sadikov, E., Madhavan, J., Wang, L., Halevy, A.: In: WWW (2010)
Shen, D., Pan, R., Sun, J.T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: In: TOIS (2006)
Shen, D., Sun, J., Yang, Q., Chen, Z.: In: SIGIR (2006)
http://www.seomoz.org/beginners-guide-to-seo/keywordresearch
Wang, C.J., Chen, H.H.: Intent mining in search query logs for automatic search script generation. Knowl. Inf. Syst. 39(3), 513 (2014)
Wang, X., Zhai, C.: In: CIKM (2008)
Wallach, H.: In: ICML (2006)
Wallach, H.M.: Unpublished doctoral dissertation. Univ. of Cambridge (2008)
Wen, J.R., Nie, J.Y., Zhang, H.J.: Query clustering using content words and user feedback. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (ACM, 2001), pp. 442–443
Yang, D., Shen, D.-R., Yu, G., Kou, Y., Nie, T.-Z.: Query intent disambiguation of keyword-based semantic entity search in dataspaces. J. Comput. Sci. Technol. 28(2), 382 (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, D., Leung, K.WT. & Ng, W. Query intent mining with multiple dimensions of web search data. World Wide Web 19, 475–497 (2016). https://doi.org/10.1007/s11280-015-0336-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-015-0336-2