[go: up one dir, main page]

Skip to main content

A Semi-supervised Corpus Annotation for Saudi Sentiment Analysis Using Twitter

  • Conference paper
  • First Online:
Advances in Brain Inspired Cognitive Systems (BICS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10989))

Included in the following conference series:

  • 2712 Accesses

Abstract

In the literature, limited work has been conducted to develop sentiment resources for Saudi dialect. The lack of resources such as dialectical lexicons and corpora are some of the major bottlenecks to the successful development of Arabic sentiment analysis models. In this paper, a semi-supervised approach is presented to construct an annotated sentiment corpus for Saudi dialect using Twitter. The presented approach is primarily based on a list of lexicons built by using word embedding techniques such as word2vec. A huge corpus extracted from twitter is annotated and manually reviewed to exclude incorrect annotated tweets which is publicly available. For corpus validation, state-of-the-art classification algorithms (such as Logistic Regression, Support Vector Machine, and Naive Bayes) are applied and evaluated. Simulation results demonstrate that the Naive Bayes algorithm outperformed all other approaches and achieved accuracy up to 91%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Please contact aaq@cs.stir.ac.uk or ahu@stir.ac.uk to access the dataset.

References

  1. Abdul-Mageed, M., Diab, M.: Sana: a large scale multi-genre, multi-dialect lexicon for arabic subjectivity and sentiment analysis. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), European Language Resources Association (ELRA) (2014). http://www.aclweb.org/anthology/L14-1702

  2. Abdul-Mageed, M., Diab, M.T.: Awatif: A multi-genre corpus for modern standard Arabic subjectivity and sentiment analysis. In: LREC, pp. 3907–3914. Citeseer (2012)

    Google Scholar 

  3. Al-Twairesh, N., Al-Khalifa, H.S., Al-Salman, A.S.: Arasenti: Large-scale twitter-specific Arabic sentiment lexicons. In: ACL (2016)

    Google Scholar 

  4. Aldayel, H.K., Azmi, A.M.: Arabic tweets sentiment analysis - a hybrid scheme. J. Inf. Sci. 42(6), 782–797 (2016)

    Article  Google Scholar 

  5. Alqarafi, A.S., Adeel, A., Gogate, M., Dashitpour, K., Hussain, A., Durrani, T.: Toward’s arabic multi-modal sentiment analysis. In: Liang, Q., Mu, J., Jia, M., Wang, W., Feng, X., Zhang, B. (eds.) Communications, Signal Processing, and Systems, pp. 2378–2386. Springer, Singapore (2019). https://doi.org/10.1007/978-981-10-6571-2_290

    Chapter  Google Scholar 

  6. Altrabsheh, N., El-Masri, M., Mansour, H.: Combining sentiment lexicons of Arabic terms (2017)

    Google Scholar 

  7. Assiri, A., Emam, A., Al-Dossari, H.: Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis. J. Inf. Sci. 44(2), 184–202 (2018). https://doi.org/10.1177/0165551516688143

    Article  Google Scholar 

  8. Badaro, G., Baly, R., Hajj, H., Habash, N., El-Hajj, W.: A large scale Arabic sentiment lexicon for Arabic opinion mining. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 165–173 (2014)

    Google Scholar 

  9. Badaro, G., Baly, R., Hajj, H.M., Habash, N., El-Hajj, W.: A large scale Arabic sentiment lexicon for Arabic opinion mining. In: ANLP@EMNLP (2014)

    Google Scholar 

  10. Dashtipour, K., et al.: Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn. Comput. 8(4), 757–771 (2016). https://doi.org/10.1007/s12559-016-9415-7

    Article  Google Scholar 

  11. El-Beltagy, S.R., Ali, A.: Open issues in the sentiment analysis of Arabic social media: a case study. In: 2013 9th International Conference on Innovations in Information Technology (IIT), pp. 215–220 (2013)

    Google Scholar 

  12. Eskander, R., Rambow, O.: Slsa: A sentiment lexicon for standard Arabic. In: EMNLP (2015)

    Google Scholar 

  13. Guellil, I., Boukhalfa, K.: Social big data mining: A survey focused on opinion mining and sentiments analysis. In: Programming and Systems (ISPS), 12th International Symposium on 2015, pp. 1–10. IEEE (2015)

    Google Scholar 

  14. Khalifa, K., Omar, N.: A hybrid method using lexicon-based approach and naive bayes classifier for arabic opinion question answering. J. Comput. Sci. 10(10), 1961 (2014)

    Article  Google Scholar 

  15. Nabil, M., Aly, M., Atiya, A.: ASTD: Arabic sentiment tweets dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2515–2519 (2015)

    Google Scholar 

  16. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Calzolari, N., et al. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), European Language Resources Association (ELRA), Valletta, Malta (may 2010)

    Google Scholar 

  17. Rushdi-Saleh, M., Martín-Valdivia, M.T., Ureña-López, L.A., Perea-Ortega, J.M.: OCA: opinion corpus for Arabic. J. Assoc. Inf. Sci. Technol. 62(10), 2045–2054 (2011)

    Article  Google Scholar 

  18. Sadat, F., Kazemi, F., Farzindar, A.: Automatic identification of Arabic dialects in social media. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis, pp. 35–40. ACM (2014)

    Google Scholar 

  19. Shoukry, A., Rafea, A.: A hybrid approach for sentiment classification of Egyptian dialect tweets. In: Arabic Computational Linguistics (ACLing), First International Conference on 2015, pp. 78–85. IEEE (2015)

    Google Scholar 

  20. Soliman, A.B., Eissa, K., El-Beltagy, S.R.: Aravec: a set of Arabic word embedding models for use in Arabic NLP. Proced. Comput. Sci. 117, 256–265 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahsan Adeel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alqarafi, A., Adeel, A., Hawalah, A., Swingler, K., Hussain, A. (2018). A Semi-supervised Corpus Annotation for Saudi Sentiment Analysis Using Twitter. In: Ren, J., et al. Advances in Brain Inspired Cognitive Systems. BICS 2018. Lecture Notes in Computer Science(), vol 10989. Springer, Cham. https://doi.org/10.1007/978-3-030-00563-4_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00563-4_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00562-7

  • Online ISBN: 978-3-030-00563-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics