BERT Driven Sentiment Classification With PyTorch
BERT Driven Sentiment Classification With PyTorch
sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))
import warnings
%matplotlib inline
1
[nltk_data] Downloading package stopwords to /usr/share/nltk_data…
[nltk_data] Package stopwords is already up-to-date!
1 Read Data
[3]: df=pd.read_csv("/kaggle/input/apps-reviews/apps_reviews.csv")
[4]: wordcloud_mask=np.array(Image.open("/kaggle/input/wordcloud-mask-collection/
↪twitter.png"))
[5]: df.head()
reviewCreatedVersion at \
0 4.17.0.3 2020-04-05 22:25:57
1 4.17.0.3 2020-04-04 13:40:01
2 4.17.0.3 2020-04-01 16:18:13
3 4.17.0.2 2020-03-12 08:17:34
4 4.17.0.2 2020-03-14 17:41:01
replyContent repliedAt \
0 According to our TOS, and the term you have ag… 2020-04-05 15:10:24
1 It sounds like you logged in with a different … 2020-04-05 15:11:35
2 This sounds odd! We are not aware of any issue… 2020-04-02 16:05:56
3 We do offer this option as part of the Advance… 2020-03-15 06:20:13
4 We're sorry you feel this way! 90% of the app … 2020-03-15 23:45:51
sortOrder appId
2
0 most_relevant com.anydo
1 most_relevant com.anydo
2 most_relevant com.anydo
3 most_relevant com.anydo
4 most_relevant com.anydo
2 Data Exploration
[6]: df.shape
[7]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15746 entries, 0 to 15745
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 userName 15746 non-null object
1 userImage 15746 non-null object
2 content 15746 non-null object
3 score 15746 non-null int64
4 thumbsUpCount 15746 non-null int64
5 reviewCreatedVersion 13533 non-null object
6 at 15746 non-null object
7 replyContent 7367 non-null object
8 repliedAt 7367 non-null object
9 sortOrder 15746 non-null object
10 appId 15746 non-null object
dtypes: int64(2), object(9)
memory usage: 1.3+ MB
[8]: df.isnull().sum()
[8]: userName 0
userImage 0
content 0
score 0
thumbsUpCount 0
reviewCreatedVersion 2213
at 0
replyContent 8379
repliedAt 8379
sortOrder 0
appId 0
3
dtype: int64
[9]: df.
↪drop(columns=["userName","userImage","thumbsUpCount","reviewCreatedVersion","at","repliedAt"
[10]: df.head()
replyContent
0 According to our TOS, and the term you have ag…
1 It sounds like you logged in with a different …
2 This sounds odd! We are not aware of any issue…
3 We do offer this option as part of the Advance…
4 We're sorry you feel this way! 90% of the app …
[12]: df.head()
replyContent \
0 According to our TOS, and the term you have ag…
1 It sounds like you logged in with a different …
2 This sounds odd! We are not aware of any issue…
3 We do offer this option as part of the Advance…
4 We're sorry you feel this way! 90% of the app …
text
0 Update: After getting a response from the deve…
1 Used it for a fair amount of time without any …
2 Your app sucks now!!!!! Used to be good but no…
3 It seems OK, but very basic. Recurring tasks n…
4 Absolutely worthless. This app runs a prohibit…
[13]: df.drop(columns=["content","replyContent"],axis=1,inplace=True)
4
[14]: df.head()
[15]: df["text"][0]
[15]: "Update: After getting a response from the developer I would change my rating to
0 stars if possible. These guys hide behind confusing and opaque terms and
refuse to budge at all. I'm so annoyed that my money has been lost to them!
Really terrible customer experience. Original: Be very careful when signing up
for a free trial of this app. If you happen to go over they automatically charge
you for a full years subscription and refuse to refund. Terrible customer
experience and the app is just OK. According to our TOS, and the term you have
agreed to upon creating your free trial, after the 7 days, there is an automatic
charge for the plan, unless you cancel prior to the renewal date. As you did not
cancel the subscription in time, you were charged per this agreement."
[16]: df["score"].value_counts()
[16]: score
3 5042
5 2900
4 2776
1 2566
2 2462
Name: count, dtype: int64
[17]: plt.figure(figsize=(10,8))
sns.countplot(x="score",data=df)
plt.xlabel("Review Score")
plt.title("Review Score Count")
plt.show()
5
2.0.1 That’s hugely imbalanced, but it’s okay. We’re going to convert the dataset
into negative, neutral and positive sentiment:
df['sentiment'] = df['score'].apply(categorize_sentiment)
[19]: df.head()
6
[19]: score text sentiment
0 1 Update: After getting a response from the deve… Negative
1 1 Used it for a fair amount of time without any … Negative
2 1 Your app sucks now!!!!! Used to be good but no… Negative
3 1 It seems OK, but very basic. Recurring tasks n… Negative
4 1 Absolutely worthless. This app runs a prohibit… Negative
[20]: plt.figure(figsize=(10,8))
sns.countplot(x="sentiment",data=df,palette="gnuplot")
plt.xlabel("Review Score")
plt.title("Sentiment Count")
plt.show()
7
4 Clean Data
[21]: def clean_text(text):
if isinstance(text, str):
text = BeautifulSoup(text, 'html.parser').get_text()
text = re.sub(r"[^a-zA-Z]", " ", text)
text = text.translate(str.maketrans("", "", string.punctuation))
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F"
u"\U0001F300-\U0001F5FF"
u"\U0001F680-\U0001F6FF"
u"\U0001F1E0-\U0001F1FF"
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
"]+", flags=re.UNICODE)
text = emoji_pattern.sub(r'', text)
stop_words = set(stopwords.words('english'))
tokens = word_tokenize(text)
tokens = [word for word in tokens if word not in stop_words]
return text
else:
return ""
[22]: df["text"]=df["text"].apply(clean_text)
[23]: df.head()
8
[24]: df.drop(columns=["score"],axis=1,inplace=True)
9
7 Neutral Text Length
[27]: neutral_text=df[df["sentiment"]=="Neutral"]["text"].str.len()
plt.figure(figsize=(10,8))
plt.hist(neutral_text, bins=20,label='Neutral Text Length',color="blue")
plt.title("Neutral Text Length")
plt.show()
10
8 Together
[28]: plt.figure(figsize=(10,8))
plt.hist(neutral_text, bins=20,label='Neutral Text Length',color="blue")
plt.hist(positive_text, bins=20,label='Positive Text Length',color="green")
plt.hist(negative_text, bins=20,label='Negative Text Length',color="red")
plt.title("Negative vs Positive Vs Neutral Data Length")
plt.show()
11
9 Negative Data WordCloud
[29]: plt.figure(figsize=(15, 15))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
12
10 Positive Data Wordcloud
[30]: plt.figure(figsize=(15,15))
positive_wordcloud=df[df["sentiment"]=="Positive"]
positive_text=" ".join(positive_wordcloud['text'].values.tolist())
wordcloud = WordCloud(width=800, height=800,stopwords=STOPWORDS,␣
↪background_color='black',␣
↪max_words=800,colormap="CMRmap",mask=wordcloud_mask).generate(positive_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
13
11 Neutral Data Wordcloud
[31]: plt.figure(figsize=(15,15))
neutral_wordcloud=df[df["sentiment"]=="Neutral"]
neutral_text=" ".join(neutral_wordcloud['text'].values.tolist())
wordcloud = WordCloud(width=800, height=800,stopwords=STOPWORDS,␣
↪background_color='black',␣
↪max_words=800,colormap="CMRmap",mask=wordcloud_mask).generate(neutral_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
14
12 ALL Data Wordcloud
[32]: plt.figure(figsize=(15,15))
all_text=" ".join(df['text'].values.tolist())
wordcloud = WordCloud(width=800, height=800,stopwords=STOPWORDS,␣
↪background_color='orange',␣
↪max_words=800,colormap="ocean",mask=wordcloud_mask).generate(all_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
15
13 30 Most common Words From All Text
[33]: from itertools import chain
from collections import Counter
data_set =df["text"].str.split()
all_words = list(chain.from_iterable(data_set))
counter = Counter(all_words)
common_words = counter.most_common(30)
df_common_words = pd.DataFrame(common_words, columns=['Word', 'Count'])
16
"darkviolet", "chocolate", "mediumslateblue", "darkgreen",␣
↪"orangered", "mediumblue", "peru", "mediumspringgreen"]
plt.figure(figsize=(12, 6))
sns.barplot(x='Count', y='Word', data=df_common_words, palette=colors)
plt.title('30 Most Common Words')
plt.xlabel('Count')
plt.ylabel('Word')
plt.show()
17
15 Most Common Words From Positive Text
[35]: positive_text = df[df["sentiment"] == "Positive"]
data_set = positive_text["text"].str.split()
all_words = [word for sublist in data_set for word in sublist]
counter = Counter(all_words)
common_words = counter.most_common(30)
df_common_words = pd.DataFrame(common_words, columns=['Word', 'Count'])
plt.figure(figsize=(12, 8))
sns.barplot(x='Count', y='Word', data=df_common_words,palette="Set2")
plt.title('30 Most Common Words Positive')
plt.xlabel('Count Positive Word')
plt.ylabel('Positive Word')
plt.show()
18
16 Most Common Words From Neutral Text
[36]: neutral_text = df[df["sentiment"] == "Neutral"]
data_set = neutral_text["text"].str.split()
all_words = [word for sublist in data_set for word in sublist]
counter = Counter(all_words)
common_words = counter.most_common(30)
df_common_words = pd.DataFrame(common_words, columns=['Word', 'Count'])
plt.figure(figsize=(12, 8))
sns.barplot(x='Count', y='Word', data=df_common_words,palette="Accent")
plt.title('30 Most Common Words Neutral')
plt.xlabel('Count Neutral Word')
plt.ylabel('Neutral Word')
plt.show()
19
[37]: df.head()
[39]: sentiment_map={"Negative":0,"Positive":1,"Neutral":2}
df["sentiment"]=df["sentiment"].replace(sentiment_map)
[40]: df.head()
20
[40]: text sentiment
0 update getting response developer would change… 0
1 used fair amount time without problems suddenl… 0
2 app sucks used good update physically open clo… 0
3 seems ok basic recurring tasks need work actua… 0
4 absolutely worthless app runs prohibitively cl… 0
average_text_length = df['text_length'].mean()
[42]: df.drop(columns=["text_length"],axis=1,inplace=True)
21
sample_text="update getting response developer would change rating stars␣
↪possible guys hide behind confusing opaque terms refuse budge annoyed money␣
↪lost really terrible customer experience original careful signing free trial␣
↪terrible customer experience app ok according tos term agreed upon creating␣
↪free trial days automatic charge plan unless cancel prior renewal date␣
19.0.1 Some basic operations can convert the text to tokens and tokens to unique
integers (ids):
print("=======================================================================================
print(f'Sentence: {sample_text}')
print("=======================================================================================
print(f'Tokens: {tokens}')
print("=======================================================================================
print(f'Token IDs: {token_ids}')
print("=======================================================================================
================================================================================
================
Sentence: update getting response developer would change rating stars possible
guys hide behind confusing opaque terms refuse budge annoyed money lost really
terrible customer experience original careful signing free trial app happen go
automatically charge full years subscription refuse refund terrible customer
experience app ok according tos term agreed upon creating free trial days
automatic charge plan unless cancel prior renewal date cancel subscription time
charged per agreement
================================================================================
================
Tokens: ['update', 'getting', 'response', 'developer', 'would', 'change',
'rating', 'stars', 'possible', 'guys', 'hide', 'behind', 'confusing', 'opaque',
'terms', 'refuse', 'budge', 'annoyed', 'money', 'lost', 'really', 'terrible',
'customer', 'experience', 'original', 'careful', 'signing', 'free', 'trial',
'app', 'happen', 'go', 'automatically', 'charge', 'full', 'years',
'subscription', 'refuse', 'ref', '##und', 'terrible', 'customer', 'experience',
'app', 'ok', 'according', 'to', '##s', 'term', 'agreed', 'upon', 'creating',
'free', 'trial', 'days', 'automatic', 'charge', 'plan', 'unless', 'cancel',
'prior', 'renewal', 'date', 'cancel', 'subscription', 'time', 'charged', 'per',
'agreement']
================================================================================
================
Token IDs: [10651, 2893, 3433, 9722, 2052, 2689, 5790, 3340, 2825, 4364, 5342,
22
2369, 16801, 28670, 3408, 10214, 24981, 11654, 2769, 2439, 2428, 6659, 8013,
3325, 2434, 6176, 6608, 2489, 3979, 10439, 4148, 2175, 8073, 3715, 2440, 2086,
15002, 10214, 25416, 8630, 6659, 8013, 3325, 10439, 7929, 2429, 2000, 2015,
2744, 3530, 2588, 4526, 2489, 3979, 2420, 6882, 3715, 2933, 4983, 17542, 3188,
14524, 3058, 17542, 15002, 2051, 5338, 2566, 3820]
================================================================================
================
[CLS] - we must add this token to the start of each sentence, so BERT knows we’re doing classifi-
cation
[48]: bert_tokenizer.cls_token, bert_tokenizer.cls_token_id
[49]: ('[PAD]', 0)
19.0.4 BERT understands tokens that were in the training set. Everything else can
be encoded using the [UNK] (unknown) token:
23
length. Defaulting to 'longest_first' truncation strategy. If you encode pairs
of sequences (GLUE-style) with the tokenizer you can select this strategy more
precisely by providing a specific strategy to `truncation`.
/opt/conda/lib/python3.10/site-
packages/transformers/tokenization_utils_base.py:2834: FutureWarning: The
`pad_to_max_length` argument is deprecated and will be removed in a future
version, use `padding=True` or `padding='longest'` to pad to the longest
sequence in the batch, or use `padding='max_length'` to pad to a max length. In
this case, you can give a specific length with `max_length` (e.g.
`max_length=45`) or leave max_length to None to pad to the maximal input size of
the model (e.g. 512 for Bert).
warnings.warn(
[52]: encoding.keys()
32
================================================================================
===================
tensor([ 101, 10651, 2893, 3433, 9722, 2052, 2689, 5790, 3340, 2825,
4364, 5342, 2369, 16801, 28670, 3408, 10214, 24981, 11654, 2769,
2439, 2428, 6659, 8013, 3325, 2434, 6176, 6608, 2489, 3979,
10439, 102])
32
================================================================================
===================
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1])
24
22 We can inverse the tokenization to have a look at the special
tokens:
[55]: bert_tokenizer.convert_ids_to_tokens(encoding['input_ids'][0])
[55]: ['[CLS]',
'update',
'getting',
'response',
'developer',
'would',
'change',
'rating',
'stars',
'possible',
'guys',
'hide',
'behind',
'confusing',
'opaque',
'terms',
'refuse',
'budge',
'annoyed',
'money',
'lost',
'really',
'terrible',
'customer',
'experience',
'original',
'careful',
'signing',
'free',
'trial',
'app',
'[SEP]']
25
tokens = bert_tokenizer.encode(txt, max_length=512)
token_lens.append(len(tokens))
26
25 plot the distribution using histplot
[58]: plt.figure(figsize=(10, 8))
sns.histplot(token_lens, bins=30, kde=True, color='blue')
plt.title('Distribution of Token Lengths', fontsize=16)
plt.xlabel('Token Length', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.show()
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning:
use_inf_as_na option is deprecated and will be removed in a future version.
Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
27
26 Calculate min, max, and average
[59]: import statistics
min_length = min(token_lens)
max_length = max(token_lens)
avg_length = statistics.mean(token_lens)
print("==============================================================================")
print(f"Minimum Token Length: {min_length}")
print("==============================================================================")
print(f"Maximum Token Length: {max_length}")
print("==============================================================================")
print(f"Average Token Length: {avg_length:.2f}")
print("==============================================================================")
==============================================================================
Minimum Token Length: 2
==============================================================================
Maximum Token Length: 257
==============================================================================
Average Token Length: 21.42
==============================================================================
27 Maxlen==170
[60]: max_len=170
[61]: df.head()
class GPReviewDataset(Dataset):
def __init__(self, reviews, targets, tokenizer, max_len):
self.reviews = reviews
28
self.targets = targets
self.tokenizer = tokenizer
self.max_len = max_len
def __len__(self):
return len(self.reviews)
encoding = self.tokenizer.encode_plus(
review,
add_special_tokens=True,
max_length=self.max_len,
padding='max_length', # Ensure padding to max_len
return_token_type_ids=False,
return_attention_mask=True,
truncation=True, # Make sure long reviews are truncated to max_len
return_tensors='pt', # Return as PyTorch tensors
)
return {
'review_text': review,
'input_ids': encoding['input_ids'].flatten(), # Flatten to remove␣
↪unnecessary dimensions
The tokenizer is doing most of the heavy lifting for us. We also return the review
texts, so it’ll be easier to evaluate the predictions from our model. Let’s split the
data:
[63]: from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)
df_val, df_test = train_test_split(df_test, test_size=0.5, random_state=42)
29
28.1 We also need to create a couple of data loaders. Here’s a helper function
to do it:
[65]: def create_data_loader(df, tokenizer, max_len, batch_size):
ds = GPReviewDataset(
reviews=df.text.to_numpy(),
targets=df.sentiment.to_numpy(),
tokenizer=tokenizer,
max_len=max_len
)
return DataLoader(
ds,
batch_size=batch_size,
num_workers=4
)
[66]: BATCH_SIZE = 16
28.1.1 Let’s have a look at an example batch from our training data loader:
/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning:
os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX
is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()
/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning:
os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX
is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()
[68]: print("##############################################################################")
print(data['input_ids'].shape)
print("##############################################################################")
print(data['attention_mask'].shape)
30
print("##############################################################################")
print(data['targets'].shape)
print("##############################################################################")
##############################################################################
torch.Size([16, 170])
##############################################################################
torch.Size([16, 170])
##############################################################################
torch.Size([16])
##############################################################################
[70]: print(encoding)
{'input_ids': tensor([[ 101, 10651, 2893, 3433, 9722, 2052, 2689, 5790,
3340, 2825,
4364, 5342, 2369, 16801, 28670, 3408, 10214, 24981, 11654, 2769,
2439, 2428, 6659, 8013, 3325, 2434, 6176, 6608, 2489, 3979,
10439, 102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1]])}
[71]: output =␣
↪bert_model(input_ids=encoding['input_ids'],attention_mask=encoding['attention_mask'])
last_hidden_state=output.last_hidden_state
The last_hidden_state is a sequence of hidden states of the last layer of the model. Obtaining
the pooled_output is done by applying the BertPooler on last_hidden_state:
[72]: last_hidden_state.shape
29.0.1 We have the hidden state for each of our 32 tokens (the length of our ex-
ample sequence). But why 768? This is the number of hidden units in the
feedforward-networks. We can verify that by checking the config:
[73]: bert_model.config.hidden_size
[73]: 768
31
29.0.2 You can think of the pooled_output as a summary of the content, according to
BERT. Albeit, you might try and do better. Let’s look at the shape of the
output:
class SentimentClassifier(nn.Module):
31 We’ll move the example batch of our training data to the GPU:
[77]: input_ids = data['input_ids'].to(device)
attention_mask = data['attention_mask'].to(device)
torch.Size([16, 170])
torch.Size([16, 170])
32
31.0.1 To get the predicted probabilities from our trained model, we’ll apply the
softmax function to the outputs:
To reproduce the training procedure from the BERT paper, we’ll use the AdamW
optimizer provided by Hugging Face. It corrects weight decay, so it’s similar to the
original paper. We’ll also use a linear scheduler with no warmup steps:
[79]: from transformers import AdamW, get_linear_schedule_with_warmup
EPOCHS = 10
optimizer = AdamW(model.parameters(), lr=2e-5, correct_bias=False)
total_steps = len(train_data_loader) * EPOCHS
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=0,
num_training_steps=total_steps
)
EPOCHS = 10
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=0,
num_training_steps=total_steps
33
)
loss_fn = nn.CrossEntropyLoss().to(device)
/opt/conda/lib/python3.10/site-packages/transformers/optimization.py:591:
FutureWarning: This implementation of AdamW is deprecated and will be removed in
a future version. Use the PyTorch implementation torch.optim.AdamW instead, or
set `no_deprecation_warning=True` to disable this warning
warnings.warn(
31.0.2 How do we come up with all hyperparameters? The BERT authors have some
recommendations for fine-tuning:
• Batch size: 16, 32
• Learning rate (Adam): 5e-5, 3e-5, 2e-5
• Number of epochs: 2, 3,#### 4
We’re going to ignore the number of epochs recommendation but stick with the rest. Note that
increasing the batch size reduces the training time significantly, but gives you lower accur###
acy.
Let’s continue with writing a helper function for training our model for one epoch:
[80]: def train_epoch(
model,
data_loader,
loss_fn,
optimizer,
device,
scheduler,
n_examples
):
model = model.train()
losses = []
correct_predictions = 0
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
34
correct_predictions += torch.sum(preds == targets)
losses.append(loss.item())
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
Training the model should look familiar, except for two things. The scheduler gets
called every time a batch is fed to the model. We’re avoiding exploding gradients by
clipping the gradients of the model using clip_grad_norm_.
[81]: def eval_model(model, data_loader, loss_fn, device, n_examples):
model = model.eval()
losses = []
correct_predictions = 0
with torch.no_grad():
for d in tqdm(data_loader, desc="Evaluating", leave=False):
input_ids = d["input_ids"].to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
_, preds = torch.max(outputs, dim=1)
35
31.0.3 Using those two, we can write our training loop. We’ll also store the training
history:
[82]: EPOCHS = 20
history = defaultdict(list)
best_accuracy = 0
history['train_acc'].append(train_acc)
history['train_loss'].append(train_loss)
history['val_acc'].append(val_acc)
history['val_loss'].append(val_loss)
print("Training complete!")
print(f"Best Validation Accuracy: {best_accuracy:.4f}")
Epoch 1/20
----------
36
Train loss: 0.9913 | Train accuracy: 0.5059
Epoch 2/20
----------
Epoch 3/20
----------
Epoch 4/20
----------
Epoch 5/20
----------
Epoch 6/20
----------
37
Val loss: 0.9282 | Val accuracy: 0.6451
Epoch 7/20
----------
Epoch 8/20
----------
Epoch 9/20
----------
Epoch 10/20
----------
Epoch 11/20
----------
38
Val loss: 1.0273 | Val accuracy: 0.6514
Epoch 12/20
----------
Epoch 13/20
----------
Epoch 14/20
----------
Epoch 15/20
----------
Epoch 16/20
----------
39
Epoch 17/20
----------
Epoch 18/20
----------
Epoch 19/20
----------
Epoch 20/20
----------
Training complete!
Best Validation Accuracy: 0.6559
40
epochs = range(1, len(history['train_loss']) + 1)
plt.tight_layout()
plt.savefig('training_metrics.png', dpi=300, bbox_inches='tight')
plt.show()
plot_training_metrics(history)
41
32.1 Evaluation
So how good is our model on predicting sentiment? Let’s start by calculating the accuracy on
the test data:
[84]: test_acc, _ = eval_model(
model,
test_data_loader,
loss_fn,
device,
len(df_test)
)
test_acc.item()
[84]: 0.6565079365079365
32.1.1 The accuracy is about 1% lower on the test set. Our model seems to generalize
well.
We’ll define a helper function to get the predictions from our mode l:
42
[85]: def get_predictions(model, data_loader):
model = model.eval()
review_texts = []
predictions = []
prediction_probs = []
real_values = []
with torch.no_grad():
for d in data_loader:
texts = d["review_text"]
input_ids = d["input_ids"].to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask
)
_, preds = torch.max(outputs, dim=1)
review_texts.extend(texts)
predictions.extend(preds)
prediction_probs.extend(probs)
real_values.extend(targets)
predictions = torch.stack(predictions).cpu()
prediction_probs = torch.stack(prediction_probs).cpu()
real_values = torch.stack(real_values).cpu()
return review_texts, predictions, prediction_probs, real_values
This is similar to the evaluation function, except that we're storing the text of the reviews a
33 Classification Report
[87]: from sklearn.metrics import (
confusion_matrix,
classification_report,
roc_auc_score,
roc_curve,
43
accuracy_score,
)
print(classification_report(y_test, y_pred, target_names=class_names))
34 Confusion Matrix
[88]: cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(10,8))
sns.heatmap(cm, annot=True, fmt='d', cmap='hsv', xticklabels=class_names,␣
↪yticklabels=class_names)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix')
plt.show()
44
35 Roc Curve
[89]: plt.figure(figsize=(10,8))
45
print(f"Skipping {class_name} because only one class is present in␣
↪y_test.")
46
36 Precision Recall Curve
[90]: from sklearn.metrics import precision_recall_curve, average_precision_score
plt.figure(figsize=(10,8))
else:
print(f"Skipping {class_name} because only one class is present in␣
↪y_test.")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve for Multiclass Classification")
plt.legend(loc="lower left")
plt.grid()
plt.show()
47
37 Roc Auc Score
[91]: roc_auc = roc_auc_score(y_test, y_pred_probs, multi_class='ovr')
plt.plot([])
plt.text(0, 0, f'ROC AUC Score: {roc_auc:.4f}', fontsize=16, ha='center',␣
↪va='center', color="indigo")
plt.axis('off')
plt.xlim(-1, 1)
plt.ylim(-1, 1)
plt.show()
48
38 Log Loss
[92]: from sklearn.metrics import log_loss
classes = [0, 1, 2]
plt.plot([])
plt.text(0, 0, f'Log Loss: {logarithm_loss:.4f}', fontsize=16, ha='center',␣
↪va='center', color="black")
plt.axis('off')
plt.xlim(-1, 1)
plt.ylim(-1, 1)
plt.show()
49
39 Kappa Score
[93]: from sklearn.metrics import cohen_kappa_score
kappa = cohen_kappa_score(y_test,y_pred)
plt.plot([])
plt.text(0,0, f'Cohen Kappa Score: {kappa:.4f}', fontsize=16, ha='center',␣
↪va='center',color="orangered")
plt.axis('off')
plt.show()
50
40 matthews_corrcoef
[94]: from sklearn.metrics import matthews_corrcoef
mcc = matthews_corrcoef(y_test,y_pred)
plt.axis('off')
plt.show()
51
41 Model Prediction
[95]: idx = 2
review_text = y_review_texts[idx]
true_sentiment = y_test[idx]
pred_df = pd.DataFrame({'class_names': class_names,'values': y_pred_probs[idx]})
==============================================================================
doesnt thing gliched everytime open hi apologies issue app tasks
community driven hobby project mine popular suggestions added time
offered free without advertising first report issue kind glitches look
like thanks steve
==============================================================================
True sentiment: Negative
==============================================================================
52
[97]: plt.figure(figsize=(10,8))
sns.barplot(x='values', y='class_names', data=pred_df, orient='h')
plt.ylabel('sentiment')
plt.xlabel('probability')
plt.title("Model Prediction Probability")
plt.show()
53
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
/opt/conda/lib/python3.10/site-
packages/transformers/tokenization_utils_base.py:2834: FutureWarning: The
`pad_to_max_length` argument is deprecated and will be removed in a future
version, use `padding=True` or `padding='longest'` to pad to the longest
sequence in the batch, or use `padding='max_length'` to pad to a max length. In
this case, you can give a specific length with `max_length` (e.g.
`max_length=45`) or leave max_length to None to pad to the maximal input size of
the model (e.g. 512 for Bert).
warnings.warn(
================================================================================
==============
Review text: Every day brings a new opportunity to overcome challenges, learn
from experiences, and grow stronger, as we continuously work towards becoming
the best version of ourselves, embracing both successes and setbacks as stepping
stones toward a brighter future
================================================================================
==============
Sentiment : Positive
================================================================================
==============
[ ]:
[ ]:
54