Abstract
Text classification is the process of automatically tagging a textual document with the most relevant set of labels. This work aims to automatically map an input document based on its vocabulary features to multiple tags. To achieve this goal, a large dataset has been constructed from various Arabic news portals. The dataset has over 290k multi-tagged articles. The datasets shall be made freely available to the research community on Arabic computational linguistics. To examine this dataset, we tested both shallow learning and deep learn- ing multi-labeling approaches. A custom accuracy metric, designed for the multi-labeling task, has been developed for performance evaluation and the hamming loss metric. Firstly, we used classifiers compatible with multi-labeling tasks, such as Logistic Regression and XGBoost, by wrapping each in an OneVsRest classifier. XGBoost gave higher accuracy, scoring 91.3
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms, pp. 163–222. Springer, Boston (2012)
Al-Qadi, L., El-Rifai, H., Obaid, S., Elnagar, A.: Arabic text classification of news articles using classical supervised classifiers. In: 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS), pp. 1–6. IEEE (2019)
Al-Sbou, A.M.: A survey of arabic text classification models. Int. J. Electric. Comput. Eng. (IJECE) 8(6), 4352 (2018)
Al-Shalabi, R., Obeidat, R.: Improving knn arabic text classification with n-grams based document indexing. In: Proceedings of the Sixth International Conference on Informatics and Systems, pp. 108–112 (2008)
Alalyani, N., Larabi, S.: Nada: new Arabic dataset for text classification. Int. J. Adv. Comput. Sci. Appl. 9(9), 206–212 (2018)
Alsaleem, S.: Automated arabic text categorization using SVM and NB. Int. Arab J. e-Technol 2(2), 124–128 (2011)
Boudad, N., Faizi, R., Thami, R.O.H., Chiheb, R.: Sentiment analysis in arabic: a review of the literature. Ain Shams Eng. J. 9(4), 2479–2490 (2018)
Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., El Moutaouakkil, A.E.: Arabic text classification using deep learning technics. Int. J. Grid Distrib. Comput. 11(9), 103–114 (2018)
El-Haj, M., Rayson, P., Aboelezz, M.: Arabic dialect identification in the context of bivalency and code-switching. In: Proceedings of the 11th International Conference on Language Resources and Evaluation, Miyazaki, Japan. pp. 3622–3627. European Language Resources Association (2018)
El-Halees, A.M.: A comparative study on arabic text classification. Egypt. Comput. Sci. J. 30(2), 1–12 (2008)
Elnagar, A., Al-Debsi, R., Einea, O.: Arabic text classification using deep learning models. Inf. Process. Manage. 57(1), 102121–102121 (2020)
Elnagar, A., Einea, O.: Brad 1.0: book reviews in arabic dataset. In: IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8 (2016)
Elnagar, A., Einea, O., Al-Debsi, R.: Automatic text tagging of arabic news articles using ensemble learning models. In: Proceedings of the 3rd International Conference on Natural Language and Speech Processing, pp. 59–66 (2019)
Elnagar, A., Khalifa, Y.S., Einea, A.: Hotel arabic-reviews dataset construction for sentiment analysis applications. In: Intelligent Natural Language Processing: Trends and Applications, pp. 35–52. Springer, Heidelberg (2018)
Elnagar, A., Lulu, L., Einea, O.: An annotated huge dataset for standard and colloquial arabic reviews for subjective sentiment analysis. Procedia Comput. Sci. 142, 182–189 (2018)
Gharib, T.F., Habib, M.B., Fayed, Z.T.: Arabic text classification using support vector machines. Int. J. Comput. Their Appl. 16(4), 192–199 (2009)
Gon¸calves, T., Quaresma, P.: The impact of nlp techniques in the multilabel text classification problem. In: Intelligent Information Processing and Web Mining, pp. 424–428. Springer, Heidelberg (2004)
Harrag, F., El-Qawasmeh, E., Pichappan, P.: Improving arabic text categorization using decision trees. In: 2009 First International Conference on Networked Digital Technologies, pp. 110–115 (2009)
Hawashin, B., Mansour, A., Aljawarneh, S.: An efficient feature selection method for arabic text classification. Int. J. Comput. Appl. 83(17), 1–6 (2013)
Hmeidi, I., Hawashin, B., El-Qawasmeh, E.: Performance of knn and svm classi- fiers on full word arabic articles. Adv. Eng. Inform. 22(1), 106–111 (2008)
Li, Y., Nie, X., Huang, R.: Web spam classification method based on deep belief networks. Expert Syst. Appl. 96, 261–270 (2018)
Malmasi, S., Dras, M.: Language identication using classier ensembles. In: Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, pp. 35–43 (2015)
Moh’d A Mesleh, A.: Chi square feature extraction based svms arabic language text categorization system. J. Comput. Sci. 3(6), 430–435 (2007)
Noaman, H.M., Elmougy, S., Ghoneim, A., Hamza, T.: Naive bayes classifier based arabic document categorization. In: 2010 the 7th International Conference on Informatics and Systems (INFOS), pp. 1–5. IEEE (2010)
Quispe, O., Ocsa, A., Coronado, R.: Latent semantic indexing and convolutional neural network for multi-label and multi-class text classification. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI) (2017)
Raho, G., Al-Shalabi, R., Kanaan, G., Nassar, A.: Different classification algo- rithms based on arabic text classification: feature selection comparative study. Int. J. Adv. Comput. Sci. Appl. IJACSA 6(2), 23–28 (2015)
Saad, M.K., Ashour, W.M.: Arabic text classification using decision tree. In: Workshop on computer science and information technologies CSIT’2010 (2010)
Zaghoul, F.A., Al-Dhaheri, S.: Arabic text classification based on features reduction using artificial neural networks. In: Proceedings of the 15th International Conference on Computer Modelling and Simulation, pp. 485–490 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rifai, H.E., Al Qadi, L., Elnagar, A. (2021). Arabic Multi-label Text Classification of News Articles. In: Hassanien, AE., Chang, KC., Mincong, T. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2021. Advances in Intelligent Systems and Computing, vol 1339. Springer, Cham. https://doi.org/10.1007/978-3-030-69717-4_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-69717-4_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69716-7
Online ISBN: 978-3-030-69717-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)