[go: up one dir, main page]

Skip to main content

Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis

  • Conference paper
Semantics, Web and Mining (EWMF 2005, KDO 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4289))

Included in the following conference series:

Abstract

We show that eigenvector decomposition can be used to extract a term taxonomy from a given collection of text documents. So far, methods based on eigenvector decomposition, such as latent semantic indexing (LSI) or principal component analysis (PCA), were only known to be useful for extracting symmetric relations between terms. We give a precise mathematical criterion for distinguishing between four kinds of relations of a given pair of terms of a given collection: unrelated (car – fruit), symmetrically related (car – automobile), asymmetrically related with the first term being more specific than the second (banana – fruit), and asymmetrically related in the other direction (fruit – banana). We give theoretical evidence for the soundness of our criterion, by showing that in a simplified mathematical model the criterion does the apparently right thing. We applied our scheme to the reconstruction of a selected part of the open directory project (ODP) hierarchy, with promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: 5th Conference on Digital Libraries (DL 2000) (2000)

    Google Scholar 

  2. Anick, P.G., Tipirneni, S.: The paraphrase search assistant: terminological feedback for iterative information seeking. In: SIGIR 1999: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 153–159. ACM Press, New York (1999)

    Chapter  Google Scholar 

  3. Bast, H., Majumdar, D.: Why spectral retrieval works. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 11–18. ACM, New York (2005)

    Chapter  Google Scholar 

  4. Chuang, S.-L., Chien, L.-F.: A practical web-based approach to generating topic hierarchy for text segments. In: CIKM 2004: Proceedings of the Thirteenth ACM conference on Information and knowledge management, pp. 127–136. ACM Press, New York (2004)

    Chapter  Google Scholar 

  5. Cimiano, P., Ladwig, G., Staab, S.: Gimme’ the context: context-driven automatic semantic annotation with c-pankow. In: 14th International Conference on the World Wide Web (WWW 2005), pp. 332–341 (2005)

    Google Scholar 

  6. Cimiano, P.B.P., Magnini, B.: Ontology Learning from Text: Methods, Evaluation and Applications. In: Frontiers in Artificial Intelligence and Applications, vol. 123. IOS Press, Amsterdam (2005)

    Google Scholar 

  7. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K., Rajagopalan, S., Tomkins, A., Tomlin, J., Zienberer, J.: A case for automated large scale semantic annotation. J. Web Semantics 1(1) (2003)

    Google Scholar 

  8. Dupret, G.: Latent concepts and the number orthogonal factors in latent semantic analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 221–226. ACM Press, New York (2003)

    Google Scholar 

  9. Dupret, G.: Latent semantic indexing with a variable number of orthogonal factors. In: Proceedings of the RIAO 2004, Coupling approaches, coupling media and coupling languages for information retrieval, pp. 673–685, Centre de Hautes Etudes Internationales d’informatique documentaire, C.I.D., April 26-28 (2004)

    Google Scholar 

  10. Dupret, G., Piwowarski, B.: Deducing a Term Taxonomy from Term Similarities. In: ECML/PKDD 2005 Workshop on Knowledge Discovery and Ontologies (2005)

    Google Scholar 

  11. Dupret, G., Piwowarski, B.: Principal components for automatic term hierarchy building. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 37–48. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall/CRC (May 15, 1994)

    Google Scholar 

  13. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence 165(1), 91–134 (2005)

    Article  Google Scholar 

  14. Glover, E., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring hierarchical descriptions. In: CIKM 2002: Proceedings of the eleventh international conference on Information and knowledge management, pp. 507–514. ACM Press, New York (2002)

    Chapter  Google Scholar 

  15. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics, Morristown, NJ, USA, pp. 539–545. Association for Computational Linguistics (1992)

    Google Scholar 

  16. Hearst, M.A.: Automated discovery of wordnet relations. In: Fellbaum, e., Christiane (eds.) WordNet: An Electronic Lexical Database, MIT Press, Cambridge (1998)

    Google Scholar 

  17. Joho, H., Coverson, C., Sanderson, M., Beaulieu, M.: Hierarchical presentation of expansion terms. In: SAC 2002: Proceedings of the 2002 ACM symposium on Applied computing, pp. 645–649. ACM Press, New York (2002)

    Chapter  Google Scholar 

  18. Lawrie, D., Croft, W.: Discovering and comparing topic hierarchies. In: Proceedings of RIAO 2000 (2000)

    Google Scholar 

  19. Lawrie, D., Croft, W.B., Rosenberg, A.: Finding topic words for hierarchical summarization. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 349–357. ACM Press, New York (2001)

    Chapter  Google Scholar 

  20. Lawrie, D.J., Croft, W.B.: Generating hierarchical summaries for web searches. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 457–458. ACM Press, New York (2003)

    Chapter  Google Scholar 

  21. Maedche, A., Staab, S.: Discovering conceptual relations from text. In: 14th European Conference on Artifial Intelligence (ECAI 2000), pp. 321–325 (2000)

    Google Scholar 

  22. Nanas, N., Uren, V., Roeck, A.D.: Building and applying a concept hierarchy representation of a user profile. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 198–204. ACM Press, New York (2003)

    Chapter  Google Scholar 

  23. Papadimitriou, C.H., Tamaki, H., Raghavan, P., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings PODS 1998, pp. 159–168 (1998)

    Google Scholar 

  24. Park, Y.C., Han, Y.S., Choi, K.-S.: Automatic thesaurus construction using bayesian networks. In: CIKM 1995: Proceedings of the fourth international conference on Information and knowledge management, pp. 212–217. ACM Press, New York (1995)

    Chapter  Google Scholar 

  25. Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: SIGIR 1999: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 206–213. ACM Press, New York (1999)

    Chapter  Google Scholar 

  26. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972) (Reprinted in B. C. Griffith (ed.) Key Papers in Information Science (1980) Willett, P. (ed.) Document Retrieval Systems, 1988)

    Article  Google Scholar 

  27. Uren, V., Cimiano, P., Iria, J., Handschuh, S., Vargas-Vera, M., Motta, E., Ciravegna, F.: Semantic annotation for knowledge management: Requirements and a survey of the state of the art. Journal of Web Semantics 4(1), 14–28 (2006)

    Google Scholar 

  28. Volz, R., Handschuh, S., Staab, S., Stojanovic, L., Stojanovic, N.: Unveiling the hidden bride: deep annotation for mapping and migrating legacy data to the semantic web. Journal of Web Semantics 1(2), 187–206 (2004)

    Google Scholar 

  29. Woods, W.A.: Conceptual indexing: A better way to organize knowledge. Technical report, Sun Labs Technical Report: TR-97-61 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bast, H., Dupret, G., Majumdar, D., Piwowarski, B. (2006). Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis. In: Ackermann, M., et al. Semantics, Web and Mining. EWMF KDO 2005 2005. Lecture Notes in Computer Science(), vol 4289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908678_7

Download citation

  • DOI: https://doi.org/10.1007/11908678_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-47697-9

  • Online ISBN: 978-3-540-47698-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics