Information Retrieval and Text Mining

Sholom M. Weiss⁵,
Nitin Indurkhya⁶ &
Tong Zhang⁷

Part of the book series: Texts in Computer Science ((TCS))

3734 Accesses
4 Citations

Abstract

Information retrieval is described in terms of predictive text mining. The methods can be considered variations of similarity-based nearest-neighbor methods. Both key word search and full document matching are examined. Different methods of measuring similarity are considered including cosine similarity. Classical information retrieval has evolved from retrieval of documents stored in databases to web or intranet based documents. These document have richer representations with links among documents. Link analysis for ranking similarity of documents is described. Some performance issues for computing similarity are considered including the specification of inverted lists for indexing documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471–479, 1972.
Article Google Scholar
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.
Article MATH MathSciNet Google Scholar
L. Page and S. Brin. The anatomy of a search engine. In Proceedings of the 7th International WWW Conference (WWW 98). Brisbane, Australia, 1998. http://www7.scu.edu.au.
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Stanford Digital Libraries Technologies Project, 1998.
Google Scholar
S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of the Third Text Retrieval Conference, pages 109–126. NIST, Washington, 1994. http://trec.nist.gov/pubs/trec3/papers/city.ps.gz.
Google Scholar
G. Salton. A document retrieval system for man-machine interaction. In Proceedings of the 19th Annual International ACM National Conference, pages L2.3.1–L2.3.20. ACM, New York, 1964.
Google Scholar
G. Salton and M. Lesk. The SMART automatic document retrieval system—An illustration. Communications of the ACM, 8(6):391–398, 1965.
Article Google Scholar
G. Salton and M. Lesk. Computer evaluation of indexing and text processing. Journal of the Association for Computing Machinery, 15(1):8–36, 1968.
Article MATH Google Scholar
G. Salton and H. Wu. A term weighting model based on utility theory. In Proceedings of SIGIR, pages 9–22. ACM, New York, 1980.
Google Scholar

Download references

Author information

Authors and Affiliations

T.J. Watson Research Center, IBM Corporation, Kitchawan Road 1101, Yorktown Heights, 10598, NY, USA
Sholom M. Weiss
School of Computer Science & Engg., University of New South Wales, Sydney, 2052, NSW, Australia
Nitin Indurkhya
Dept. Statistics, Hill Center, Rutgers University, Piscataway, 08854-8019, NJ, USA
Tong Zhang

Authors

Sholom M. Weiss
View author publications
You can also search for this author in PubMed Google Scholar
Nitin Indurkhya
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sholom M. Weiss .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Weiss, S.M., Indurkhya, N., Zhang, T. (2010). Information Retrieval and Text Mining. In: Fundamentals of Predictive Text Mining. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84996-226-1_4

Download citation

DOI: https://doi.org/10.1007/978-1-84996-226-1_4
Publisher Name: Springer, London
Print ISBN: 978-1-84996-225-4
Online ISBN: 978-1-84996-226-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics