[go: up one dir, main page]

Skip to main content

Arabic Multi-label Text Classification of News Articles

  • Conference paper
  • First Online:
Advanced Machine Learning Technologies and Applications (AMLTA 2021)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1339))

  • 1891 Accesses

Abstract

Text classification is the process of automatically tagging a textual document with the most relevant set of labels. This work aims to automatically map an input document based on its vocabulary features to multiple tags. To achieve this goal, a large dataset has been constructed from various Arabic news portals. The dataset has over 290k multi-tagged articles. The datasets shall be made freely available to the research community on Arabic computational linguistics. To examine this dataset, we tested both shallow learning and deep learn- ing multi-labeling approaches. A custom accuracy metric, designed for the multi-labeling task, has been developed for performance evaluation and the hamming loss metric. Firstly, we used classifiers compatible with multi-labeling tasks, such as Logistic Regression and XGBoost, by wrapping each in an OneVsRest classifier. XGBoost gave higher accuracy, scoring 91.3

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms, pp. 163–222. Springer, Boston (2012)

    Google Scholar 

  2. Al-Qadi, L., El-Rifai, H., Obaid, S., Elnagar, A.: Arabic text classification of news articles using classical supervised classifiers. In: 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS), pp. 1–6. IEEE (2019)

    Google Scholar 

  3. Al-Sbou, A.M.: A survey of arabic text classification models. Int. J. Electric. Comput. Eng. (IJECE) 8(6), 4352 (2018)

    Article  Google Scholar 

  4. Al-Shalabi, R., Obeidat, R.: Improving knn arabic text classification with n-grams based document indexing. In: Proceedings of the Sixth International Conference on Informatics and Systems, pp. 108–112 (2008)

    Google Scholar 

  5. Alalyani, N., Larabi, S.: Nada: new Arabic dataset for text classification. Int. J. Adv. Comput. Sci. Appl. 9(9), 206–212 (2018)

    Google Scholar 

  6. Alsaleem, S.: Automated arabic text categorization using SVM and NB. Int. Arab J. e-Technol 2(2), 124–128 (2011)

    Google Scholar 

  7. Boudad, N., Faizi, R., Thami, R.O.H., Chiheb, R.: Sentiment analysis in arabic: a review of the literature. Ain Shams Eng. J. 9(4), 2479–2490 (2018)

    Article  Google Scholar 

  8. Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., El Moutaouakkil, A.E.: Arabic text classification using deep learning technics. Int. J. Grid Distrib. Comput. 11(9), 103–114 (2018)

    Article  Google Scholar 

  9. El-Haj, M., Rayson, P., Aboelezz, M.: Arabic dialect identification in the context of bivalency and code-switching. In: Proceedings of the 11th International Conference on Language Resources and Evaluation, Miyazaki, Japan. pp. 3622–3627. European Language Resources Association (2018)

    Google Scholar 

  10. El-Halees, A.M.: A comparative study on arabic text classification. Egypt. Comput. Sci. J. 30(2), 1–12 (2008)

    Google Scholar 

  11. Elnagar, A., Al-Debsi, R., Einea, O.: Arabic text classification using deep learning models. Inf. Process. Manage. 57(1), 102121–102121 (2020)

    Article  Google Scholar 

  12. Elnagar, A., Einea, O.: Brad 1.0: book reviews in arabic dataset. In: IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8 (2016)

    Google Scholar 

  13. Elnagar, A., Einea, O., Al-Debsi, R.: Automatic text tagging of arabic news articles using ensemble learning models. In: Proceedings of the 3rd International Conference on Natural Language and Speech Processing, pp. 59–66 (2019)

    Google Scholar 

  14. Elnagar, A., Khalifa, Y.S., Einea, A.: Hotel arabic-reviews dataset construction for sentiment analysis applications. In: Intelligent Natural Language Processing: Trends and Applications, pp. 35–52. Springer, Heidelberg (2018)

    Google Scholar 

  15. Elnagar, A., Lulu, L., Einea, O.: An annotated huge dataset for standard and colloquial arabic reviews for subjective sentiment analysis. Procedia Comput. Sci. 142, 182–189 (2018)

    Article  Google Scholar 

  16. Gharib, T.F., Habib, M.B., Fayed, Z.T.: Arabic text classification using support vector machines. Int. J. Comput. Their Appl. 16(4), 192–199 (2009)

    Google Scholar 

  17. Gon¸calves, T., Quaresma, P.: The impact of nlp techniques in the multilabel text classification problem. In: Intelligent Information Processing and Web Mining, pp. 424–428. Springer, Heidelberg (2004)

    Google Scholar 

  18. Harrag, F., El-Qawasmeh, E., Pichappan, P.: Improving arabic text categorization using decision trees. In: 2009 First International Conference on Networked Digital Technologies, pp. 110–115 (2009)

    Google Scholar 

  19. Hawashin, B., Mansour, A., Aljawarneh, S.: An efficient feature selection method for arabic text classification. Int. J. Comput. Appl. 83(17), 1–6 (2013)

    Google Scholar 

  20. Hmeidi, I., Hawashin, B., El-Qawasmeh, E.: Performance of knn and svm classi- fiers on full word arabic articles. Adv. Eng. Inform. 22(1), 106–111 (2008)

    Article  Google Scholar 

  21. Li, Y., Nie, X., Huang, R.: Web spam classification method based on deep belief networks. Expert Syst. Appl. 96, 261–270 (2018)

    Article  Google Scholar 

  22. Malmasi, S., Dras, M.: Language identication using classier ensembles. In: Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, pp. 35–43 (2015)

    Google Scholar 

  23. Moh’d A Mesleh, A.: Chi square feature extraction based svms arabic language text categorization system. J. Comput. Sci. 3(6), 430–435 (2007)

    Article  Google Scholar 

  24. Noaman, H.M., Elmougy, S., Ghoneim, A., Hamza, T.: Naive bayes classifier based arabic document categorization. In: 2010 the 7th International Conference on Informatics and Systems (INFOS), pp. 1–5. IEEE (2010)

    Google Scholar 

  25. Quispe, O., Ocsa, A., Coronado, R.: Latent semantic indexing and convolutional neural network for multi-label and multi-class text classification. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI) (2017)

    Google Scholar 

  26. Raho, G., Al-Shalabi, R., Kanaan, G., Nassar, A.: Different classification algo- rithms based on arabic text classification: feature selection comparative study. Int. J. Adv. Comput. Sci. Appl. IJACSA 6(2), 23–28 (2015)

    Google Scholar 

  27. Saad, M.K., Ashour, W.M.: Arabic text classification using decision tree. In: Workshop on computer science and information technologies CSIT’2010 (2010)

    Google Scholar 

  28. Zaghoul, F.A., Al-Dhaheri, S.: Arabic text classification based on features reduction using artificial neural networks. In: Proceedings of the 15th International Conference on Computer Modelling and Simulation, pp. 485–490 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashraf Elnagar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rifai, H.E., Al Qadi, L., Elnagar, A. (2021). Arabic Multi-label Text Classification of News Articles. In: Hassanien, AE., Chang, KC., Mincong, T. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2021. Advances in Intelligent Systems and Computing, vol 1339. Springer, Cham. https://doi.org/10.1007/978-3-030-69717-4_41

Download citation

Publish with us

Policies and ethics