Abstract
This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Baker, L. D. & McCallum, A. K. (1998). Distributional clustering ofwords for text classification. In Proceedings of the 21st ACM-SIGIR International Conference on Research and Development in Information Retrieval (SIGIR).
Bellegarda, J. R. (1998). Exploiting both local and global constraints for multi-span statistical language modeling. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98, pp. 677–680.
Berry, M. W. Dumais, S. T., & O'Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 573–595.
Cheeseman, P. & Stutz, J. (1996). Bayesian classification (AutoClass): Theory and results. In Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, & Ramasamy Uthurusamy, (Eds.), Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press.
Coccaro, N. & Jurafsky, D. (1998). Towards better integration of semantic predictors in statistical language modeling. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP).
Deerwester, S., Dumais, G. W., Furnas, S. T., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B, 39, 1–38.
Dumais, S. T. (1995). Latent semantic indexing (LSI): TREC-3 report. In D.K Harman, (Ed.), Proceedings of the Text REtrieval Conference (TREC-3), pp. 219–230.
Foltz, P. W. & Dumais, S. T. (1992). An analysis of information filtering methods. Communications of the ACM, 35(12), 51–60.
Gilula, Z., & Haberman, S. J. (1986). Canonical analysis of contingency tables by maximum likelihood. Journal of the American Statistical Association, 81(395), 780–788.
Golub, G. H. & Van Loan, C. F. (1996). Matrix Computations. Johns Hopkins University Press, 3rd (ed.).
Hofmann, T., Puzicha, J., & Jordan, M. I. (1999). Unsupervised learning from dyadic data. In Advances in Neural Information Processing Systems, Vol. 11, MIT Press.
Katz, S. M. (1987). Estimation of probabilities for sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3), 400–401.
Landauer, T. K. & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review.
LDC. Linguistic Data Consortium: TDT pilot study corpus documentation. http://www.ldc.upenn.edu/TDT, 1997.
Lee, D. D. & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(675), 788–791.
Neal, R. M. & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental and other variants. In M.I. Jordan, (Ed.), Learning in Graphical Models, Dordrecht, MA: Kluwer Academic Publishers, pp. 355–368.
Pereira, F. C. N., Tishby, N. Z., & Lee, L. (1983). Distributional clustering of english words. In Proceedings of the ACL, pp. 183–190.
Rose, K., Gurewitz, E., & Fox, G. (1990). A deterministic annealing approach to clustering. Pattern Recognition Letters, 11(11), 589–594.
Salton, G. & McGill, M. J. (1983). Introduction to Modern Information Retrieval. New York: McGraw-Hill.
Saul, L. & Pereira, F. (1997). Aggregate and mixed-order Markov models for statistical language processing. In Proceedings of the 2nd International Conference on Empirical Methods in Natural Language Processing, pp. 81–89.
Ueda, N. & Nakano, R. (1988). Deterministic annealing EM algorithm. Neural Networks, 11(2), 271–282.
Witten, I. H. & Bell, T. C. (1991). The zero-frequency problem—estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4); 1085–1094.
Wolfe, M. B. W., Schreiner, M. E., Rehder, B., Laham, D., Foltz, P. W., Kintsch, W. & Landauer, T. K. (1998). Learning from text: Matching readers and texts by latent semantic analysis. Discourse Processes, 25(2/3), 309–336.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hofmann, T. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42, 177–196 (2001). https://doi.org/10.1023/A:1007617005950
Issue Date:
DOI: https://doi.org/10.1023/A:1007617005950