[go: up one dir, main page]

0% found this document useful (0 votes)
48 views6 pages

Gov Decisions

ff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views6 pages

Gov Decisions

ff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2020 International Conference on Intelligent Engineering and Management (ICIEM)

Survey Paper: Sentiment Analysis for Major


Government Decisions
Tarun Anand Vikrant Singh Bharat Bali
Computer Science Engineering Computer Science Engineering Computer Science Engineering
Amity University Amity University Amity University
Greater Noida, India Greater Noida, India Greater Noida, India
anandtarun53619@gmail.com vikrant1r2@gmail.com bharatbali98@gmail.com

Biswa Mohan Sahoo Basu Dev Shivhare Amar Deep Gupta


Computer Science Engineering Computer Science Engineering Computer Science Engineering
Amity University Amity University Amity University
Greater Noida, India Greater Noida, India Greater Noida, India
biswamohans@gmail.com basuiimt@gmail.com amardeep.ip@gmail.com

Abstract— On internet, there are plenty of social networking sites Politics: On twitter, the majority of tweets are about
on which people express their perception regarding a topic every politics so the politicians want to make a use of it and connect
day which can be helpful for sentiment analysis. In this paper, to the general population. The general mass express their
we have discussed the general procedures in sentiment analysis feelings regarding various government policies and decisions
and how we are going to use python and its libraries for on the platform provided by social media. Hence, opinion
performing sentiment analysis on data retrieved from twitter’s mining about politics related inputs can be used to understand
website so we can get the view of what public thinks of last ten the public view regarding the input, thereby helping
government decisions like Demonetization, Goods and Service politicians.
Tax (GST), Bharat Interface for Money (BHIM), Citizen
Amendment Act (CAA). People on internet express their opinions, views and
insights about numerous topics on various social networking
Keywords—Opinion mining, Sentiment analysis, Machine sites and other sites, these sites can be used to mine data,
learning, Twitter, Python, Natural Language Processing. which act as a source for sentiment analysis.

II. RELATED WORK


I.INTRODUCTION
In reference to Rasika Wagh [3] acquainted with TextBlob,
which is based on the shoulders of NLTK and Pattern. A major
The systematic technological process of deriving conclusive favourable position of this is, it is anything but difficult to
opinions through analysation of emotions being expressed in learn and offers a ton of highlights like opinion investigation,
a part of text is termed as sentiment analysis [1]. In today’s POS-labelling, thing phrase extraction, and so on. It has now
culture, there has been a rising need for analysing data to get become my go-to library for performing NLP undertakings.
information about opinions and sentiments of general mass in
They have classified the sentiment analysis algorithm in the
various fields of jobs. Numerous applications of sentiment
category of supervised and unsupervised algorithm
analysis in today’s time is: [2] Commerce: In
classification of algorithms, under the supervised learning,
the world of marketing, there are plenty of companies which
algorithms. [4].
can use this research to collect information about opinions of
In reference to Peiman Barnaghi [5], they used a WEKA
their customers regarding their products and services so that
package for performing stemming whereas in our paper we
they can review it. Thereby, twitter acts as a useful source to
have performed lemmatization through the help of TextBlob.
collect data and for analysis to determine customer’s
satisfaction.
In reference to K.Arun [6], from the sentimental analysis on
demonetization from twitter, we can see the reaction of people
Sports events: People often share their views about sport towards it. He created a bar graph for the representation of
personalities, matches and various sport teams on the internet. various emotions portrayed by the people in the society.
This can be used through sentiment analysis to get various Demonetization occurred on 8th November 2016 for currency
insights on the players, teams, matches, etc. [2] 500 &1000. they divided their work good, work, silence,
flourishing

104
978-1-7281-4097-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Cornell University Library. Downloaded on August 15,2020 at 07:57:49 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)

In reference to Dr. Kavita Pabreja [7] GST sentiment analysis Retweets must be eliminated because it does not add any value
using Twitter data, emotions of people are put under in the sentiment analysis.
analyzation which shows peoples acceptance for GST but
with too much contradictory feeling. Goods and services are 2.) Tokenization:
divided into five different tax slabs for collection of tax - 0%, Tokenization is a process which dissect the text and separate
5%, 12%, 18% and 28%. Her work analysed how the people them into individual set of words. This is a crucial task in both
are reacting. She divided her work under anger, anticipation, NLP and Sentiment analysis but it is even more significant in
disgust, fear, sad, joy and surprise. sentiment analysis as sentiment information is frequently
dispersed through the text and unusually represented.
III. METHODOLOGY 3.) Eliminating stop-words:
The stop words occurrence is quite repetitive and persistent in
The general flow of steps in the performing sentiment analysis text as they perform task of connecting sub part of sentences.
is given in the figure 1. There are mainly four significant They can be removed from the collection of words present in
modules in the sentiment analysis: data collection, processing, text only they do not alter the sentiment of the entire text. This
classification modules and analysis of output. can be done by examining every word present in the text
against a dictionary (WEKA) comprising stop words such as
“and”, “or”, “still”, “also”, “able”, “the”, “which”, etc. and
discarding all the matched ones. [5]

3.) Symbols incorporated in twitter:


Tweets often contain extra symbols like “@” or “#” as well as
URLs. The world following “#” termed as “hashtag” plays an
important role when we are using twitter data as it gives
valuable information about the tweet being analysed. URLs
can be completely discarded because they add no sentiment
value to the tweet. The word following the “@” is username
Figure 1: General steps in sentiment analysis which can be discarded too as it also plays no important part
in sentiment analysis. This can be done quite easily by the use
A. Input: of regex that returns a match for these symbols.
Firstly, we choose a topic which has to be a single keyword
then we can collect tweets related to it for performing 4.) Stemming and Lemmatization:
sentiment analysis. The core functioning of stemming and lemmatization is quite
identical which often leads to confusion between them. They
B. Extracting tweets: both are used in reduction of the word. Stemming reduces the
After choosing the topic for performing sentiment analysis word to its “root form whereas lemmatization reduces the
on, we need to extract relevant tweets from twitter. The tweets words to its “lemma. The difference is that in lemmatization
can be collected using Twitters API called “Tweepy” which is the POS (part of speech) is also considered giving significance
on its website. Then they provide the keys necessary for to the context in the word was used whereas stemming does
retrieval of tweets. not.

C. Pre-processing of tweets: 5.) Expansion of slangs and abbreviation.


The pre-processing of data is crucial task because the data
fetched contains noise, inconsistencies, useless symbols, 6.) Spelling correction.
syntactical errors, etc which need to be filtered out in order to
catch the accuracy of sentiments in the correct context which D. SENTIMENT CLASSIFICATION:
further helps in tweet classification. The set of steps used must
have a singular objective to make the data convenient for Sentiment classification is done using two approaches, either
machine so that there is reduction in the work load of feature through supervised learning or unsupervised learning. For
extraction, thereby making it smoother and more efficient. twitter dataset, it is done through supervised learning.

Some of the most typical procedures carried out for pre- Supervised learning as the name suggests the nearness of an
processing of tweets are: administrator as a trainer. It is the process of deducing a
function through input-output pairs that maps an input to a
1.) Deletion of redundant tweets:

105

Authorized licensed use limited to: Cornell University Library. Downloaded on August 15,2020 at 07:57:49 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)

particular output. It draws out the inference from a labelled the objective content for example dictionary-based technique
data set being trained. and the content arrangement strategy. In the primary
methodology semantic direction of the record is determined
Furthermore, with respect to Sentiment analysis, we have 3 by adding the SO of the words and expressions in the report.
algorithms in Supervised learning i.e. - Naïve In the second methodology, a classifier is worked from the
Bayes, Bayesian logistic regression and Maximum explained occasions of the content or sentences. The
Entropy Classifier [2]. vocabulary-based methodology functions admirably in cross-
space and can be improved effectively with a wellspring of
extra information for estimation order. We follow the main
methodology in this examination work.[9]

Dictionary-based Reasoning: a supposition word reference


based on words acquired from website pages identified with a
particular space. To do as such, we associate competitor
sentiment words, seed words, and area utilizing AcroDefMI3
what's more TrueSkill techniques. This dictionary-based
approach is contrasted with the SentiWordNet lexical asset.
Test results show the reasonableness of our methodology for
various areas and rare assessment words.[10]

Corpus-Based Reasoning: There has been a developing want


for clarified corpora recently by specialists tending to various
Figure 2: Sentiment Analysis Algorithm issues in semantics and computational phonetics Linguists are
utilizing fundamentally commented on corpora to consider
various etymological wonders [11].
• Naïve Bayes:” It is a probabilistic classifier with solid
contingent freedom presumption that is ideal for arranging
classes with profoundly subordinate highlights. Adherence to D. Feature Extraction:
the sentiment classes is determined to utilize the Bayes The raw data present is useless which brings the need to use
hypothesis. data mining techniques to extract features from it, features are
crucial piece of information which assists in reaching a
• Bayesian logistic regression: This model provides a method solution.
for shrinking, for categorisation of words present in the text
and also select features at a same time. It uses a Laplace prior All words present in the text doesnt portray similar
to avoid overfitting and produces sparse predictive models for significance, so there is a need of predicting and choosing of
text data.[12]. helpful words for sentiment analysis and eliminate the
unnecessary words.
• Maximum Entropy Classifier: This classifier takes no
suspicions in regards to the relations between highlights; it • Unigram features: In this only a single word is put under
generally attempts to augment entropy of a framework by consideration at a time and then the decision is made whether
processing the restrictive dissemination of its group marks. it can be a feature.
• N-gram features: In this more than one word is put under
As informed above, Unsupervised learning is the readiness of consideration at a time.
machine using information that is neither orchestrated nor • External lexicon: In this, there is a list of words with
named and allowing the estimation to catch up on that predefined sentiments (either positive or negative)
information without providing a proper route. Here the task of Part of speech tagging is used in feature selection techniques.
a machine is to bundle unsorted information according to
similarities, models and appears differently in relation to no IV.TWITTER SENTIMENT ANALYSIS WITH PYTHON
prior planning of data. It has 3 subparts i.e. Lexicon Based
Reasoning, Dictionary-based Reasoning, and Corpus-Based Python:
Reasoning. Personally, I favours Python, since we need a ton of data
handling (not just information processing) in the Sentiment
examination field. Python is a solid decision when it comes to
Lexicon Based Reasoning: There are two significant data pre-processing. Python provides a large set of machine
methodologies for programmed extraction of feelings from
106

Authorized licensed use limited to: Cornell University Library. Downloaded on August 15,2020 at 07:57:49 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)

learning libraries so I would suggest you utilize it in your code


for analysis.

Tweepy:

It is an API given by Twitter publicly which is hosted on


GitHub and it helps python to extract tweets through
communication with twitter platform. The Twitter API gives
us access to the majority of Twitter's usefulness; we have used
this for tweets retrieval.

TextBlob:

TextBlob is another incredibly ground breaking NLP library


for Python. TextBlob is based upon NLTK and gives a simple
way to utilize interface to the NLTK library. TextBlob
provides an API for performing general NLP tasks.

Matplotlib:
A picture is worth a thousand words, and with Pythons
matplotlib library, it luckily takes far not exactly a thousand
expressions of code to plot. Matplotlib library can be used in
python scripts for plotting two dimensional plots like bar
charts, pie charts, etc.

We have used it for plotting our result.

V.SURVEY RESULT

We performed sentiment analysis on Goods and Service Tax


(GST), Citizen Amendment Act (CAA), Demonetization and
Bharat Interface for Money (BHIM). All of these government
decisions affected everyone in our country in a major way so
we collected the data and found that output shows people were
neutral about most decisions on the internet which may speak
about the fear that exist in people for speaking out as they may
get publicly bashed. This survey portrays not only the
advancement of technology but also how this technology
(sentiment analysis through machine earning) can be used to
analyze human behavior on the internet because what people
say to individuals out in public among peers/friends and on
the internet can be contradictory.

107

Authorized licensed use limited to: Cornell University Library. Downloaded on August 15,2020 at 07:57:49 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)

on the element of assorted variety in the information. Various


analysers doesn’t come up with a completely accurate result
at a point where the amount of classes are being expanded.
Likewise, it's still not tried that how precise the model will be
for themes other than the one in thought. Henceforth slant
investigation has a bright extent of advancement in the future.
Sentimental analysis can be utilized to minimize misfortune
among companies and assist the government in making better
choices. what's more, at the rate AI progression is developing
there may be new ponders. It is additionally enormously
gainful to foresee human conduct as indicated by recognizing
patterns.

References: -

[1] Kaur, H., & Mangat, V. (2017, February). A survey of sentiment


analysis techniques. In 2017 International Conference on I-SMAC
(IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) (pp. 921-
925). IEEE.

[2] Gupta, B., Negi, M., Vishwakarma, K., Rawat, G., Badhani, P., &
Tech, B. (2017). Study of Twitter sentiment analysis using
machine learning algorithms on Python. International Journal of
Computer Applications, 165(9), 29-34.

[3] Wagh, R., & Punde, P. (2018, March). Survey on sentiment


analysis using twitter dataset. In 2018 Second International
Conference on Electronics, Communication and Aerospace
Technology (ICECA) (pp. 208-211). IEEE.

[4] Zimbra, D., Ghiassi, M., & Lee, S. (2016, January). Brandrelated
Twitter sentiment analysis using feature engineering and the
dynamic architecture for artificial neural networks. In 2016 49th
Hawaii International Conference on System Sciences (HICSS)
(pp. 1930-1938). IEEE.

[5] Barnaghi, P., Ghaffari, P., & Breslin, J. G. (2016, March). Opinion
mining and sentiment polarity on twitter and correlation between
events and sentiment. In 2016 IEEE Second International
Conference on Big Data Computing Service and Applications (Big
Data Service) (pp. 52-57). IEEE.

[6] Arun, K., Srinagesh, A., & Ramesh, M. (2017). Twitter sentiment
analysis on demonetization tweets in India using R language.
International Journal of Computer Engineering In Research
Trends, 4(6), 252-258.

VI. CONCLUSION [7] Pabreja, K. (2018). SENTIMENT ANALYSIS ON GST


TWEETS. International Journal of Advanced Research in
Twitter conclusion investigation goes under the classification Computer Science, 9(2).
of content also, conclusion mining. It is additionally
tremendously productive for legislators, organizations and [8] Mamgain, N., Mehta, E., Mittal, A., & Bhatt, G. (2016, March).
Sentiment analysis of top colleges in India using Twitter data. In
sports offices. It centres around dissecting the estimations of 2016 International Conference on Computational Techniques in
the tweets and nourishing the information to an AI model in Information and Communication Technologies (ICCTICT) ( pp.
request to prepare it and afterward check its precision, with 525-530). IEEE.
the goal that we can utilize this model for later use as indicated
by the outcomes. It comprises of course of actions like [9] Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M.
information assortment, content pre-preparing, assumption (2011). Lexicon-based methods for sentiment analysis.
Computational linguistics, 37(2), 267-307.
discovery, assumption grouping, preparing and testing the
model. Be that as it may it despite everything comes up short
108

Authorized licensed use limited to: Cornell University Library. Downloaded on August 15,2020 at 07:57:49 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)

[10] Cruz, L., Ochoa, J., Roche, M., & Poncelet, P. (2016, September).
Dictionary-Based Sentiment Analysis applied to specific domain
using a Web Mining approach. In 3rd. Annual Internacional
Symposium on Information Management and Big Data (p.
80).

[11] Brill, E. D. (1993). A corpus-based approach to language learning.


IRCS Technical Reports Series, 191.

[12] Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-scale


Bayesian logistic regression for text categorization.
technometrics, 49(3), 291-304.

109

Authorized licensed use limited to: Cornell University Library. Downloaded on August 15,2020 at 07:57:49 UTC from IEEE Xplore. Restrictions apply.

You might also like