Abstract
Information retrieval is described in terms of predictive text mining. The methods can be considered variations of similarity-based nearest-neighbor methods. Both key word search and full document matching are examined. Different methods of measuring similarity are considered including cosine similarity. Classical information retrieval has evolved from retrieval of documents stored in databases to web or intranet based documents. These document have richer representations with links among documents. Link analysis for ranking similarity of documents is described. Some performance issues for computing similarity are considered including the specification of inverted lists for indexing documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471–479, 1972.
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.
L. Page and S. Brin. The anatomy of a search engine. In Proceedings of the 7th International WWW Conference (WWW 98). Brisbane, Australia, 1998. http://www7.scu.edu.au.
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Stanford Digital Libraries Technologies Project, 1998.
S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of the Third Text Retrieval Conference, pages 109–126. NIST, Washington, 1994. http://trec.nist.gov/pubs/trec3/papers/city.ps.gz.
G. Salton. A document retrieval system for man-machine interaction. In Proceedings of the 19th Annual International ACM National Conference, pages L2.3.1–L2.3.20. ACM, New York, 1964.
G. Salton and M. Lesk. The SMART automatic document retrieval system—An illustration. Communications of the ACM, 8(6):391–398, 1965.
G. Salton and M. Lesk. Computer evaluation of indexing and text processing. Journal of the Association for Computing Machinery, 15(1):8–36, 1968.
G. Salton and H. Wu. A term weighting model based on utility theory. In Proceedings of SIGIR, pages 9–22. ACM, New York, 1980.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag London Limited
About this chapter
Cite this chapter
Weiss, S.M., Indurkhya, N., Zhang, T. (2010). Information Retrieval and Text Mining. In: Fundamentals of Predictive Text Mining. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84996-226-1_4
Download citation
DOI: https://doi.org/10.1007/978-1-84996-226-1_4
Publisher Name: Springer, London
Print ISBN: 978-1-84996-225-4
Online ISBN: 978-1-84996-226-1
eBook Packages: Computer ScienceComputer Science (R0)