Abstract
Text categorization is a supervised learning task which aims to assign labels to documents based on the predicted outcome suggested by a classifier trained on a set of labelled documents. The association of text classification to facilitate labelling reports/complaints in the economic and health related fields can have a tremendous impact in the speed at which these are processed, and therefore, lowering the required time to act upon these complaints and reports. In this work, we aim to classify complaints into the main 4 economic activities given by the Portuguese Economic and Food Safety Authority. We evaluate the classification performance of 9 algorithms (Complement Naïve Bayes, Bernoulli Naïve Bayes, Multinomial Naïve Bayes, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, AdaBoost and Logistic Regression) at different layers of text preprocessing. Results reveal high levels of accuracy, roughly around 85%. It was also observed that the linear classifiers (support vector machine and logistic regression) allowed us to obtain higher f1-measure values than the other classifiers in addition to the high accuracy values revealed. It was possible to conclude that the use of these algorithms is more adequate for the data selected, and that applying text classification methods can facilitate and help the complaints and reports processing which, in turn, leads to a swifter action by authorities in charge. Thus, relying on text classification of reports and complaints can have a positive influence in either economic crime prevention or in public health, in this case, by means of food-related inspections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
European Parliament and Council, Regulation (EC) No 178/2002 of 28 January 2002. Off. J. Eur. Commun. 31, 1–24 (2002)
European Commission, RASFF 2017 Annual Report (2017)
EC and EP, Directive 2001/95/EC of the European Parliament and of the Council of 3 December 2001 on general product safety. Off. J. Eur. Commun. (7), 14 (2002)
Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: an overview. AI Mag. 13(3), 57 (1992)
Han, J., Cai, Y., Cerconet, N.: Knowledge discovery in databases: an attribute-oriented. In: Proceedings 18th VLDB Conference, Vancouver, Br. Columbia, Canada (1992)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: Knowledge discovery and data mining: towards a unifying framework. In: International Conference Knowledge Discovery Data Mining (1996)
Matheus, C.J., Piatetsky Shapiro, G., Chan, P.K.: Systems for knowledge discovery in databases. IEEE Trans. Knowl. Data Eng. 5(6), 903–916 (1993)
Ristoski, P., Paulheim, H.: Semantic Web in data mining and knowledge discovery: a comprehensive survey. J. Web Seman. 36, 1–22 (2016)
Soibelman, L., Kim, H.: Data preparation process for construction knowledge generation through knowledge discovery in databases. J. Comput. Civ. Eng. 16(1), 39–48 (2002)
Abidi, S.R.S.R., et al.: Cyber security for your organisation starts here. Haemophilia 11(4), 487–497 (2018)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knowledge from volumes of data. Commun. ACM 39, 27–34 (1996)
Hu, N., Zhang, T., Gao, B., Bose, I.: What do hotel customers complain about? text analysis using structural topic model. Tour. Manag 72, 417–426 (2019)
Joung, J., Jung, K., Ko, S., Kim, K.: Customer complaints analysis using text mining and outcome-driven innovation method market-oriented product development. Sustainability 11(1), 40 (2018)
Pisarevskaya, D., Galitsky, B., Ozerov, A., Taylor, J.: An anatomy of a lie: discourse patterns in customer complaints deception dataset. In: The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019 (2019)
Tong, X., Wu, B., Wang, B., Lv, J.: A complaint text classification model based on character-level convolutional network. In: Proceedings of the IEEE International Conference on Software Engineering and Service Sciences, ICSESS (2019)
Tong, G., Guo, B., Yi, O., Zhiwen, Y.: Mining and analyzing user feedback from app reviews: an econometric approach. In: Proceedings - 2018 IEEE, SmartWorld/UIC/ATC/ScalCom/CBDCo (2018)
Genc-Nayebi, N., Abran, A.: A systematic literature review: opinion mining studies from mobile app store user reviews. J. Syst. Softw. 125, 201–219 (2017)
Das, S., Mudgal, A., Dutta, A., Geedipally, S.: Vehicle consumer complaint reports involving severe incidents: mining large contingency tables. T. Res. Rec. 2672(32), 72–82 (2018)
Kalyoncu, F., Zeydan, E., Yigit, I.O., Yildirim, A.: A customer complaint analysis tool for mobile network operators. In: Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2018)
Guru, D.S., Ali, M., Suhil, M.: A novel term weighting scheme and an approach for classification of agricultural arabic text complaints. In: 2nd IEEE International Workshop on Arabic and Derived Script Analysis and Recognition, ASAR 2018 (2018)
Jović, A., Brkić, K., Bogunović, N.: A review of feature selection methods with applications. In: 2015 38th MIPRO 2015 - Proceedings (2015)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Gonçalves, T., Quaresma, P.: Evaluating preprocessing techniques in a text classification problem. Unisinos (2005)
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: 15th EACL 2017 - Proceedings of Conference (2017)
Pranckevičius, T., Marcinkevičius, V.: Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Balt. J. Mod. Comput. 5(2), 221 (2017)
Kanaris, I., Stamatatos, E.: Learning to recognize webpage genres. Inf. Pro. Manag. 45(5), 499–512 (2009)
HaCohen-Kerner, Y., Dilmon, R., Hone, M., Ben-Basan, M.A.: Automatic classification of complaint letters according to service provider categories. Inf. Process. Manag. 56(6), 102102 (2019)
Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. LDV Forum Gld. J. Comput. Linguist. Lang. Technol. 20(1), 19–62 (2005)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Plisson, J., Lavrac, N., Mladenić, D.D.: A rule based approach to word lemmatization. In: Proceedings 7th International Multiconference Information Society (2004)
Acknowledgements
This work is supported by project IA.SAE, funded by Fundação para a Ciência e a Tecnologia (FCT) through program INCoDe.2030. This research was partially supported by LIACC - Artificial Intelligence and Computer Science Laboratory of the University of Porto (FCT/UID/CEC/00027/2020).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Magalhães, G., Faria, B.M., Reis, L.P., Cardoso, H.L., Caldeira, C., Oliveira, A. (2020). Automating Complaints Processing in the Food and Economic Sector: A Classification Approach. In: Rocha, Á., Adeli, H., Reis, L., Costanzo, S., Orovic, I., Moreira, F. (eds) Trends and Innovations in Information Systems and Technologies. WorldCIST 2020. Advances in Intelligent Systems and Computing, vol 1160. Springer, Cham. https://doi.org/10.1007/978-3-030-45691-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-45691-7_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45690-0
Online ISBN: 978-3-030-45691-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)