Tunisian Dialect Sentiment Analysis

This paper enhances Tunisian sentiment analysis by introducing named entity tagging as a preprocessing task. It investigates the impact of named entity tagging combined with other preprocessing like removing non-sentimental content, stemming, and stopwords. Three datasets from Tunisian social media were used to test different N-gram schemes, feature selection methods, and supervised models like Naive Bayes and SVM, both with and without named entity tagging. Results showed that SVM generally outperformed Naive Bayes, and that combining named entity tagging with negation and stemming achieved the best performance.

Uploaded by

Yassir Matrane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views6 pages

Tunisian Dialect Sentiment Analysis

Uploaded by

Yassir Matrane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Synthesis on ‘Tunisian Dialect Sentiment

Analysis: A Natural Language Processing-based

Approach’

In this paper, the performance of Tunisian SA has been enhanced in comparison to

the previous works of Tunisian dialect by introducing named entities tagging as a
preprocessing task and investigating its impact on the sentiment classification
performance when it is combined with other preprocessing tasks.

These preprocessing tasks are removing the non-sentimental content such as URLs,
usernames, dates, digits, hashtags symbols, and punctuation as well as performing
both light stemming and Farasa stemmer because this latter yields lower
segmentation errors than existing Arabic stemmers. Moreover, the authors used a list
of 1661 MSA stopwords provided by the NLP group KACST. Common emotions were
taken into consideration by replacing any emoji by its corresponding label. Negation
was inferred from the negative words like ‫ مفملش‬،‫ ماكمش‬،‫ مانيش‬،‫ ماهمش‬،‫ال‬،‫ ليس‬to replace them
by the tag ‘NegWord’.

After preprocessing data, three N-grams schemes including unigrams, bigrams and
trigrams were adopted as they can capture information about the local word order
and save the training time consumed by supervised methods. In addition, term
frequency property was employed to reduce the feature size according to predefined
frequency thresholds. Regarding the lexicon-base model, unigrams and a
combination of unigrams and bigrams were used in order to cover single and
compound phrases of the used lexicon.

Named entities were processed using the NER system provided by ‘Character-Aware
Neural Networks for Arabic Named Entity Recognition for Social Media’. The
produced named entities were then classified into positive or negative in order to be
tagged in the preprocessing step.

As far as the supervised model, NB and SVM were used as SVM can handle high-
dimensional feature vectors effectively as well as a straight forward sum method
(‘Polarity analysis of non figurative tweets: Tw-StAR participation on DEFT 2017’) to
determine the polarity of a tweet via the lexicon-based model. Besides, a manually-
built Tunisian sentiment lexicon of 5382 entries was used with non-stemmed data
while for input data being stemmed or light stemmed, this lexicon was extended to
include the stemmed/light-stemmed variations of its words and phrases such that the
lexicon size was increased into 14345 single and compound entries.

During the experiments, three publicly available datasets with a content harvested
from Tunisian and MSA social media platforms have been used and reduced, which
are Tunisian election corpus (3043), Tunisian sentiment analysis corpus (7366) and
Tunisian Arabic corpus (746) along with a combinations of preprocessing tasks,
these experiments are compared against the systems of (‘Tunisian dialect and MSA
datasets for sentiment analysis’), (‘Sentiment analysis of Tunisian dialect: Linguistic
resources and experiments’) and (‘Tunisian Arabic customer’s reviews processing
and analysis for an internet supervision system’).

According to the results, SVM always performs better than NB for large-sized
datasets such as TSAC, whereas NB is better for medium and small-sized datasets.
The lexicon-based performances listed in table 3, 5 and 7 emphasize the role of NEs
in improving SA performance. In addition, combining NEs with negation and
stemming scored the best performances.
Some references:
Bootstrapping Sentiment Labels For
Unannotated Documents With
Polarity PageRan

Bootstrapping Sentiment Labels For

Unannotated Documents With
Polarity PageRan
Bootstrapping Sentiment Labels For
Unannotated Documents With
Polarity PageRan
1) Arabic dialect identification using a parallel multi-dialectal corpus by
Malmasi
2) Tw-StAR at SemEval-2017 task 4: sentiment classification of Arabic tweets
by Mulki
3) Data and text mining techniques for classification Arabic tweet polarity by
Brahimi
4) Improving stemming for Arabic information retrieval: light stemming and co-
occurrence analysis by Larkey
5) A neural architecture for dialectal Arabic segmentation by Samih

Sentence Level Arabic Sentiment Analysis
No ratings yet
Sentence Level Arabic Sentiment Analysis
5 pages
A Method of Deep Learning Tackles Sentiment Analysis Problem in Arabic Texts
No ratings yet
A Method of Deep Learning Tackles Sentiment Analysis Problem in Arabic Texts
12 pages
5316ijnlc01 PDF
No ratings yet
5316ijnlc01 PDF
11 pages
Improving Sentiment Analysis in Arabic Using Word Representation
No ratings yet
Improving Sentiment Analysis in Arabic Using Word Representation
6 pages
Sentiment Analysis and Classification of Arab Jordanian Facebook Comments For Jordanian Telecom Companies Using Lexicon-Based Approach and Machine Learning
No ratings yet
Sentiment Analysis and Classification of Arab Jordanian Facebook Comments For Jordanian Telecom Companies Using Lexicon-Based Approach and Machine Learning
5 pages
Sentiment Analysis For Sudanese Arabic Dialect-1
No ratings yet
Sentiment Analysis For Sudanese Arabic Dialect-1
6 pages
Arabic Sentiment Analysis
No ratings yet
Arabic Sentiment Analysis
6 pages
Learning Word Representations For Tunisian Sentime
No ratings yet
Learning Word Representations For Tunisian Sentime
13 pages
Arabic Sentiment Analysis
No ratings yet
Arabic Sentiment Analysis
6 pages
Tunroberta: A Tunisian Robustly Optimized Bert Approach Model For Sentiment Analysis
No ratings yet
Tunroberta: A Tunisian Robustly Optimized Bert Approach Model For Sentiment Analysis
5 pages
Paper 3 PDF
No ratings yet
Paper 3 PDF
18 pages
ICCCI 2021 Paper 204
No ratings yet
ICCCI 2021 Paper 204
12 pages
Al Tamimi2017
No ratings yet
Al Tamimi2017
6 pages
Arabic Sentiment Analysis of YouTube Comments NLP-Based Machine Learning
No ratings yet
Arabic Sentiment Analysis of YouTube Comments NLP-Based Machine Learning
16 pages
AraBERT for Arabic Reviews Analysis
No ratings yet
AraBERT for Arabic Reviews Analysis
9 pages
49 BC
No ratings yet
49 BC
5 pages
Sentiment Analysis Dataset in Moroccan Dialect
No ratings yet
Sentiment Analysis Dataset in Moroccan Dialect
15 pages
Ijcs 2016 0303003 PDF
No ratings yet
Ijcs 2016 0303003 PDF
6 pages
Twitter Data Preprocessing Guide
No ratings yet
Twitter Data Preprocessing Guide
8 pages
Arabic Classification
No ratings yet
Arabic Classification
9 pages
Sentiment Analysis of Informal Malay Tweets With Deep Learning
No ratings yet
Sentiment Analysis of Informal Malay Tweets With Deep Learning
9 pages
Sentiment Classification Using Hybrid Textblob Bi-Lstm Deep Learning Model
No ratings yet
Sentiment Classification Using Hybrid Textblob Bi-Lstm Deep Learning Model
6 pages
Retrieving Terminological Information On The Net. Are Linguistic Tools Still Useful?
No ratings yet
Retrieving Terminological Information On The Net. Are Linguistic Tools Still Useful?
8 pages
Twitter Sentiment Analysis Algorithm
No ratings yet
Twitter Sentiment Analysis Algorithm
4 pages
Exploratory Arabic Offensive Language Dataset Analysis: F.husain@ku - Edu.kw Ouzuner@gmu - Edu
No ratings yet
Exploratory Arabic Offensive Language Dataset Analysis: F.husain@ku - Edu.kw Ouzuner@gmu - Edu
83 pages
Urdu Sentiment Analysis Study
No ratings yet
Urdu Sentiment Analysis Study
13 pages
An Identification Model Used For Arabic Libyan Dialects Based On Machine Learning Approach
No ratings yet
An Identification Model Used For Arabic Libyan Dialects Based On Machine Learning Approach
14 pages
14 28 1 PB
No ratings yet
14 28 1 PB
19 pages
ASAD: Arabic Sentiment Analysis Dataset
No ratings yet
ASAD: Arabic Sentiment Analysis Dataset
9 pages
EJMTC1866511614549600
No ratings yet
EJMTC1866511614549600
7 pages
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
No ratings yet
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
10 pages
Synopsis - Major
No ratings yet
Synopsis - Major
10 pages
A Comparative Study For Arabic Text Clas
No ratings yet
A Comparative Study For Arabic Text Clas
11 pages
Sentiment Analysis of Algerian Arabic Dialect On Social Media Using Bi-LSTM Recurrent Neural Networks
No ratings yet
Sentiment Analysis of Algerian Arabic Dialect On Social Media Using Bi-LSTM Recurrent Neural Networks
9 pages
Arabic Sentiment Classification Using Convolutional Neural Network and Differential Evolution Algorithm
No ratings yet
Arabic Sentiment Classification Using Convolutional Neural Network and Differential Evolution Algorithm
17 pages
Sentiment Analysis in Arabic Language Using Machine Learning
No ratings yet
Sentiment Analysis in Arabic Language Using Machine Learning
10 pages
Ijecet: International Journal of Electronics and Communication Engineering & Technology (Ijecet)
No ratings yet
Ijecet: International Journal of Electronics and Communication Engineering & Technology (Ijecet)
6 pages
Text Mining Pre-Processing Using Gata Framework An
No ratings yet
Text Mining Pre-Processing Using Gata Framework An
8 pages
JADM Volume 12 Issue 1 Pages 1-14
No ratings yet
JADM Volume 12 Issue 1 Pages 1-14
21 pages
13 Chapter 6 PSO GA DT
No ratings yet
13 Chapter 6 PSO GA DT
11 pages
Sentiment Analysis For Arabic Language: A Brief Survey of Approaches and Techniques
No ratings yet
Sentiment Analysis For Arabic Language: A Brief Survey of Approaches and Techniques
17 pages
Kim 2016
No ratings yet
Kim 2016
5 pages
Arabic Sentiment Analysis via ML
No ratings yet
Arabic Sentiment Analysis via ML
10 pages
Text Classification For Arabic Words Using Rep-Tree
No ratings yet
Text Classification For Arabic Words Using Rep-Tree
8 pages
Chunker Based Sentiment Analysis and Tense Classification For Nepali Text
No ratings yet
Chunker Based Sentiment Analysis and Tense Classification For Nepali Text
14 pages
Resource Creation Towards Automated Sentiment Analysis in Telugu (A Low Resource Language) and Integrating Multiple Domain Sources To Enhance Sentiment Prediction
No ratings yet
Resource Creation Towards Automated Sentiment Analysis in Telugu (A Low Resource Language) and Integrating Multiple Domain Sources To Enhance Sentiment Prediction
8 pages
Digital Text Recognition and Literal Comparison
No ratings yet
Digital Text Recognition and Literal Comparison
2 pages
Fine-Tuning and Multilingual Pre-Training For Abst
No ratings yet
Fine-Tuning and Multilingual Pre-Training For Abst
13 pages
Comparison of Scenario Pre-Processing Performance On Support
No ratings yet
Comparison of Scenario Pre-Processing Performance On Support
7 pages
Conflict of Interest Based Features For Expert Classification in Bibliographic Network (Diana, Chastine)
No ratings yet
Conflict of Interest Based Features For Expert Classification in Bibliographic Network (Diana, Chastine)
6 pages
Comparative Assessment of The Performance of Three WEKA Text Classifiers Applied To Arabic Text
No ratings yet
Comparative Assessment of The Performance of Three WEKA Text Classifiers Applied To Arabic Text
15 pages
A Comparative Study of Machine Learning Techniques in Classifying Full-Text Arabic Documents Versus Summarized Documents
No ratings yet
A Comparative Study of Machine Learning Techniques in Classifying Full-Text Arabic Documents Versus Summarized Documents
4 pages
The Role of Text Pre-Processing in Sentiment Analysis: Information Technology and Quantitative Management (ITQM2013)
No ratings yet
The Role of Text Pre-Processing in Sentiment Analysis: Information Technology and Quantitative Management (ITQM2013)
7 pages
Research Paper - Major-Project
No ratings yet
Research Paper - Major-Project
11 pages
Irony Detection in Tweets
No ratings yet
Irony Detection in Tweets
7 pages
AISC2024 Paper 5
No ratings yet
AISC2024 Paper 5
18 pages
(2021) (Springer) A Novel Visual-Textual Sentiment Analysis Framework For Social Media Data
No ratings yet
(2021) (Springer) A Novel Visual-Textual Sentiment Analysis Framework For Social Media Data
18 pages
Cencini al2012.WhatEcologFactorShapeSpAreaCurveNeutralModel s2
No ratings yet
Cencini al2012.WhatEcologFactorShapeSpAreaCurveNeutralModel s2
2 pages
ML-Powered Rock-Paper-Scissor Game
No ratings yet
ML-Powered Rock-Paper-Scissor Game
3 pages
Binary Tree Lab Report CS-201
No ratings yet
Binary Tree Lab Report CS-201
6 pages
Effective Robust Formulation of Right Ha
No ratings yet
Effective Robust Formulation of Right Ha
32 pages
Reconstructing Odes From Time Series Data
No ratings yet
Reconstructing Odes From Time Series Data
15 pages
Information - Theory - in - Computer - Vision - and - Pattern - Recognition 2009
No ratings yet
Information - Theory - in - Computer - Vision - and - Pattern - Recognition 2009
375 pages
DeepGame-TP: AI for Trajectory Planning
No ratings yet
DeepGame-TP: AI for Trajectory Planning
12 pages
Business Stats Exam Solutions
No ratings yet
Business Stats Exam Solutions
6 pages
General Physics 1 Module 1
No ratings yet
General Physics 1 Module 1
81 pages
Assignment No 02 Neural Networks
No ratings yet
Assignment No 02 Neural Networks
3 pages
Cia 2 PSPP
No ratings yet
Cia 2 PSPP
12 pages
73 11 Prasanna Survey
No ratings yet
73 11 Prasanna Survey
8 pages
Iat2 ML
No ratings yet
Iat2 ML
2 pages
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
No ratings yet
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
8 pages
Crypto 2
No ratings yet
Crypto 2
12 pages
Image Denoising Techniques
No ratings yet
Image Denoising Techniques
51 pages
Unit Hydrograph Solved Problems
No ratings yet
Unit Hydrograph Solved Problems
2 pages
Numerical Differentiation: Forward, Backward, Central Differences Lagrange Estimation
No ratings yet
Numerical Differentiation: Forward, Backward, Central Differences Lagrange Estimation
8 pages
Story Point Estimation
No ratings yet
Story Point Estimation
4 pages
Advanced Data Structures Course
100% (1)
Advanced Data Structures Course
92 pages
AI Project
No ratings yet
AI Project
15 pages
Graph Algorithms Assignment
No ratings yet
Graph Algorithms Assignment
2 pages
Starting With Quantum Mechanics Can Seem Challenging
No ratings yet
Starting With Quantum Mechanics Can Seem Challenging
3 pages
Department of Applied Analysis and Complex Dynamical Systems
No ratings yet
Department of Applied Analysis and Complex Dynamical Systems
8 pages
Operations Research in Production
No ratings yet
Operations Research in Production
10 pages
Semester - 6-Machine Learning
No ratings yet
Semester - 6-Machine Learning
4 pages
Aishwarya Pendyala Fall2019
100% (1)
Aishwarya Pendyala Fall2019
57 pages
Convolutional Neural Networks (CNN) Tutorial
No ratings yet
Convolutional Neural Networks (CNN) Tutorial
35 pages
Econometrics1 Syllabus Handout
No ratings yet
Econometrics1 Syllabus Handout
3 pages
Lec # 35 (MTH-351)
No ratings yet
Lec # 35 (MTH-351)
20 pages

Tunisian Dialect Sentiment Analysis

Uploaded by

Tunisian Dialect Sentiment Analysis

Uploaded by

Synthesis on ‘Tunisian Dialect Sentiment

Analysis: A Natural Language Processing-based

In this paper, the performance of Tunisian SA has been enhanced in comparison to

Bootstrapping Sentiment Labels For

You might also like