Sentiment Analysis blog series Part -2
Hello again! In this blog, I aim to provide an insight into the
world of Aspect-Based Sentiment Analysis.
hmm, what is this aspect-based analysis?
Consider a sentence: The content of the application is
great but the design is poor!
In this sentence, we can notice that the person
is positive about the content but is negative about the
design. Hence, we can say that, if we consider the two
aspects — content and design, the sentiment of the former
is positive whereas, for the latter, it is negative.
This comes particularly handy when one wants to analyze the
keywords based on which the sentiment is decided. We can
extract these aspects using the SpaCy module.
Following this step, we can find the sentiment of the aspect
or the extracted bigram by using the TextBlob module.
Analysis using SpaCy
Source: Google Image Search
SpaCy is an open-source library and is a pure blessing to
the field of Natural Language Processing. While NLTK is
predominantly used for research, SpaCy is widely used for
production software.
Source: Google Image Search
Installing SpaCy:
pip install spacy
python -m spacy download en_core_web_sm
en_core_web_sm is an English pipeline that is trained on
web text.
For more details on this pipeline, refer
to: https://spacy.io/models
Loading the libraries:
import spacy
nlp = spacy.load("en_core_web_sm")
Our next step is to extract the aspects. We do this by
extracting the noun from the given sentence. This is
illustrated by the following code snippet
corpus = ['This American movie was so good!.',
'The French restaurant down the street is amazing!',
'The application provided a great studying experince',
'The Chinese food served by the restaurant was bad!']aspect_list =
[]for sentence in corpus:
doc = nlp(sentence)
aspect_word = " " #extracting the named entites
named_entities = {ent.root.idx: ent for ent in doc.ents}
list = []
for key in named_entities.keys():
list.append(key)
if len(list):
aspect_word += named_entities[list[0]].text
#extracting the noun
for token in doc:
children = token.children
for child in children:
if( child.dep_ == "nsubj" and child.pos_ == 'NOUN'):
aspect_word += " " + child.text
aspect_list.append(aspect_word)
The output for the above code is as follows:
[‘ American movie’, ‘ French restaurant’, ‘ application’, ‘ Chinese
food’]
Now, we need to extract the words which describe these
aspects. These words are the ones from which the sentiment
scores are calculated.
This is done by extracting the adjectives which describe
these aspects.
desc_list = []for sentence in corpus:
doc = nlp(sentence)
desc_word = " "
for token in doc:
children = token.children
for child in children:
if( child.pos_ == 'ADJ'):
desc_word = child.text
desc_list.append(desc_word)
The output for the above code is as follows:
['good', 'amazing', 'great', 'bad']
Perfect!
Now, let's create a data frame with all these inputs using the
Pandas library.
import pandas as pd
df = pd.DataFrame((zip(corpus, aspect_list, desc_list)), columns
=['Sentence', 'Aspect', 'Description']
The data frame is as follows:
Looks neat!
Our next task is to calculate the sentiment scores using the
TextBlob library.
Analysis using TextBlob
Source: Google Image search
Installing TextBlob:
pip install textblob
Calculating the sentiments:
from textblob import TextBlob as tbPolarity = []
Subjectivity = []for i in range(4):
Polarity.append(tb(df['Description'][i]).sentiment.polarity)
Subjectivity.append(tb(df['Description']
[i]).sentiment.subjectivity)df['Polarity'] = Polarity
df['Subjectivity'] = Subjectivity
Here, we calculate two terms namely polarity and
subjectivity.
Polarity: It is a measure of the range [-1, 1]. The orientation
of the polarity depends upon the positivity of the sentiment.
The more positive the sentiment is the more positive is the
polarity.
Subjectivity: It is a measure of the range [0, 1]. There are
two main kinds of subjective study
namely objectivity and subjectivity. While objectivity
relates to information that is fact-based, subjectivity relates
to personal opinions. The greater the subjectivity score, the
more subjective the given sentiment is.
The final data frame is as follows:
Now, we can get a clear analysis of the sentiment scores!
One can do all of these operations using the combined
functionality: SpaCyTextBlob provided by the SpaCy
module.
Installation:
pip install spacy==3.1
pip install spacytextblob
python -m spacy download en_core_web_sm
We now utilize this library functionality by adding this to the
existing pipeline.
import spacy
from spacytextblob.spacytextblob import SpacyTextBlobnlp =
spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")
Following this, we can extract the polarity and subjectivity
score in a single step:
text = 'This book is very interesting!'
doc = nlp(text)print('Polarity:', doc._.polarity)
print('Subjectivity:', doc._.subjectivity)
print('Assessments:', doc._.assessments)
The output for this code snippet is as follows:
Polarity: 0.8125
Subjectivity: 0.65
Assessments: [(['very', 'interesting', '!'], 0.8125, 0.65, None)]
Wow! that looks very impressive and easy!