Abstract
Text data has been increasingly growing in the last years, due to the advances of web based technologies that enable the publishing of an overwhelming amount of data. One can say that, many knowledge about the world in text data, besides being stored in articles and books, is also available on blogs, tweets, web pages. This paper overviews some general techniques for text data mining, based on text retrieval models, that can be applicable to any text in natural language. The techniques are targeted to problems requiring minimum or no human effort. These techniques, which can be used in many applications, allow the discovery of main topics of a document in text data with different levels of granularity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Miner, G.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Academic Press, New York (2012)
Zhai, C., Massung, S.: Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. Morgan & Claypool, Williston (2016)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Lu, Y., Mei, Q., Zhai, C.: Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf. Retrieval 14(2), 178–203 (2011)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Berry, M.W., Castellanos, M.: Survey of text mining: clustering, classification, and retrieval (2007)
Hotho, A., et al.: A brief survey of text mining. Proc. LDV Forum 20, 19–62 (2005)
Tated, R.R., Ghonge, M.M.: A survey on text mining-techniques and application. Int. J. Res. Adv. Technol. (2015)
Berry, M.: Survey of text mining: clustering, classification, and retrieval (2003)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. retrieval 2(1–2), 1–135 (2008)
Patel, M.R., Sharma, M.G.: A survey on text mining techniques. Int. J. Eng. Comput. Sci. 3(5), 5621–5625 (2014)
Inzalkar, S., Sharma, J.: A survey on text mining-techniques and application. Int. J. Res. Sci. Eng. (2015)
Jiang, S., Zhai, C.: Random walks on adjacency graphs for mining lexical relations from big text data. In: Proceedings of IEEE International Conference on Big Data (2014). doi:10.1109/BigData.2014.7004272
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, New Jersey (2012)
Zhai, C.: Exploiting context to identify lexical atoms-A statistical view of linguistic context (1997). arXiv preprint cmp-lg/9701001
Zhai, C.: Text Mining and Analytics, 25 May 2016. https://www.coursera.org/learn/text-mining
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2007)
Acknowledgements
This work was supported by Portuguese funds through the Center of Naval Research (CINAV), Portuguese Naval Academy, Portugal.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Correia, A., Gonçalves, A. (2017). Topics Discovery in Text Mining. In: Rocha, Á., Correia, A., Adeli, H., Reis, L., Costanzo, S. (eds) Recent Advances in Information Systems and Technologies. WorldCIST 2017. Advances in Intelligent Systems and Computing, vol 569. Springer, Cham. https://doi.org/10.1007/978-3-319-56535-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-56535-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56534-7
Online ISBN: 978-3-319-56535-4
eBook Packages: EngineeringEngineering (R0)