Mini Project Report
Title: Tweet Sentiment Classification using NLP and VADER
Name: Naitik
Roll Number: [Your Roll Number]
Department: Computer Engineering
Institution: [Your College Name]
Guide: [Guide/Supervisor's Name]
Date: April 2025
Certificate
This is to certify that the mini project titled "Tweet Sentiment Classification using NLP and
VADER" has been carried out by Naitik under my guidance and supervision. This work is a
record of the student’s own efforts and has not been submitted elsewhere.
(Signature & Stamp)
Guide/Supervisor Name
Department of Computer Engineering
Acknowledgment
I would like to express my sincere thanks to my guide [Guide’s Name], for their valuable
guidance, consistent support, and encouragement throughout this project. I would also like
to thank my department and peers who contributed to this project directly or indirectly.
Abstract
This project presents a sentiment classification approach for tweets related to data science
using Natural Language Processing (NLP) and the VADER sentiment analyzer from the
NLTK library. The dataset was sourced from Kaggle, containing over 33,000 tweets. The
objective was to classify tweets into "positive" and "negative" sentiments, excluding neutral
ones. Data cleaning techniques and sentiment scoring were applied, followed by
visualization using Plotly to observe sentiment trends over time.
Table of Contents
1. Title Page
2. Certificate Page
3. Acknowledgment
4. Abstract
5. Table of Contents
6. List of Figures and Tables
7. Introduction
8. Literature Review
9. Methodology
10. Implementation
11. Results and Discussion
12. Conclusion and Future Work
13. References
14. Appendix
List of Figures and Tables
Figure 1: Sample Tweet Data
Figure 2: Sentiment Over Time Plot
Introduction
Background
With the rise of social media, understanding public sentiment through platforms like
Twitter has become important.
Problem Statement
To classify tweets into positive and negative sentiment classes using NLP techniques.
Objectives
- Clean raw Twitter data
- Analyze sentiment using VADER
- Visualize sentiment trends
Scope
This project is limited to English tweets and focuses only on binary sentiment classification.
Literature Review
Several tools and techniques exist for sentiment analysis including TextBlob, VADER, and
machine learning models. VADER is known for its accuracy with social media text. Studies
show that combining rule-based sentiment analysis with domain-specific dictionaries
improves performance.
References:
- Hutto & Gilbert, "VADER: A Parsimonious Rule-based Model for Sentiment Analysis of
Social Media Text"
- NLTK Documentation
Methodology
Tools and Technologies Used
- Python
- Pandas
- NLTK (VADER)
- Plotly
System Design
1. Data collection from Kaggle
2. Text cleaning and preprocessing
3. Sentiment analysis with VADER
4. Visualization with Plotly
Architecture Diagram
Raw Dataset → Data Cleaning → Sentiment Scoring → Visualization
Implementation
Data Collection
Data was loaded using Pandas from the CSV file. df.info() showed 33,590 records and 36
columns.
Sentiment Analysis
Used VADER SentimentIntensityAnalyzer to compute scores and classify into positive or
negative.
Categorization
Tweets were categorized based on compound score using a custom function.
Visualization
Positive and negative sentiment data was plotted over time using Plotly.
Results and Discussion
- The classifier was able to label tweets with decent accuracy based on compound score.
- The visualization showed spikes in sentiment around specific dates.
- Limitations: Did not include neutral class or sarcasm detection.
Conclusion and Future Work
This project demonstrated effective tweet classification using rule-based sentiment
analysis. Future work could involve:
- Adding sarcasm detection
- Training custom ML models
- Including multilingual support
References
- Hutto, C.J., & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment
Analysis of Social Media Text.
- NLTK Documentation
- Kaggle Dataset: https://www.kaggle.com/ruchi798/data-science-tweets
Appendix
Full cleaned dataset sample
Additional charts or plots