NLP-2 - Problem Statement

This document discusses building sequential NLP classification models to analyze customer sentiments from reviews and detect sarcasm in news headlines. It describes IMDB and news dataset details and provides tasks to preprocess data, design models using LSTM, and evaluate performance.

Uploaded by

ernkjha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views3 pages

NLP-2 - Problem Statement

Uploaded by

ernkjha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CT2

E
OJ
PR
LE
ODU
P M
NL

swatikumari295295@gmail.com
GZAI7ED0BP

AIML MODULE
PROJECT
This file is meant for personal use by swatikumari295295@gmail.com only.
©Great Learning.or
Sharing Proprietary
publishingcontent. All Rights Reserved.
the contents Unauthorised
in part or use for
full is liable or distribution prohibited
legal action.
AIML MODULE PROJECT

Sequential NLP
TOTAL
SCORE 60
General Instructions: Submission Format:
1. Submission of all the parts is expected in 1 notebook only 1. ‘.ipynb’ (Jupyter Notebook) and
2. Expected submission format: 1 ‘.ipynb’ notebook and 1 ‘.html’ notebook only 2. ‘.html' (Jupyter Notebook > File > Download as > HTML)
3. 50% marks will be deducted if insights/steps are missing in the corresponding questions. 5 Marks will be deducted if submission in any of the
4. If output for any code cell is missing, 50% marks will be deducted. formats is missing.
5. Any kind of Plagiarism will lead to 0 (zero) Marks.

Part A - 30 Marks

• DOMAIN: Digital content and entertainment industry

• CONTEXT: The objective of this project is to build a text classi ication model that analyses the customer's
sentiments based on their reviews in the IMDB database. The model uses a complex deep learning model to build
an embedding layer followed by a classi ication algorithm to analyse the sentiment of the customers.
• DATA DESCRIPTION: The Dataset of 50,000 movie reviews from IMDB, labelled by sentiment (positive/negative).
Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). For
convenience, the words are indexed by their frequency in the dataset, meaning the for that has index 1 is the most
frequent word. Use the irst 20 words from each review to speed up training, using a max vocabulary size of
10,000. As a convention, "0" does not stand for a speci ic word, but instead is used to encode any unknown word.
• PROJECT OBJECTIVE: To Build a sequential NLP classi ier which can use input text parameters to determine the
customer sentiments.
Steps and tasks: [ Total Score: 30 Marks]
1. Import and analyse the data set. [5 Marks]
swatikumari295295@gmail.com
Hint: - Use `imdb.load_data()` method
GZAI7ED0BP
- Get train and test set
- Take 10000 most frequent words
2. Perform relevant sequence adding on the data. [5 Marks]
3. Perform following data analysis: [5 Marks]
• Print shape of features and labels
• Print value of any one feature and it's label
4. Decode the feature value to get original sentence [5 Marks]
5. Design, train, tune and test a sequential model. [5 Marks]
Hint: The aim here Is to import the text, process it such a way that it can be taken as an inout to the ML/NN classi iers. Be
analytical and experimental here in trying new approaches to design the best model.
6. Use the designed model to print the prediction on any one sample. [5 Marks]

Please Note:
Intentionally limited questions/instructions are provided so that learners can explore more and perform more research
since learners are comfortable with all the concepts and implementation.

This file is meant for personal use by swatikumari295295@gmail.com only.

©Great Learning.or
Sharing Proprietary
publishingcontent. All Rights Reserved.
the contents Unauthorised
in part or use for
full is liable or distribution prohibited
legal action.
f
f
f
f
f
f
AIML MODULE PROJECT

Part B - 30 Marks

• DOMAIN: Social media analytics

• CONTEXT: Past studies in Sarcasm Detection mostly make use of Twitter datasets collected using hashtag based
supervision but such datasets are noisy in terms of labels and language. Furthermore, many tweets are replies to
other tweets and detecting sarcasm in these requires the availability of contextual tweets.In this hands-on project,
the goal is to build a model to detect whether a sentence is sarcastic or not, using Bidirectional LSTMs.
• DATA DESCRIPTION:
The dataset is collected from two news websites, theonion.com and huf ingtonpost.com.
This new dataset has the following advantages over the existing Twitter datasets:
Since news headlines are written by professionals in a formal manner, there are no spelling mistakes and informal usage. This
reduces the sparsity and also increases the chance of inding pre-trained embeddings.
Furthermore, since the sole purpose of TheOnion is to publish sarcastic news, we get high-quality labels with much less noise as
compared to Twitter datasets.
Unlike tweets that reply to other tweets, the news headlines obtained are self-contained. This would help us in teasing apart the
real sarcastic elements
Content: Each record consists of three attributes:
is_sarcastic: 1 if the record is sarcastic otherwise 0
headline: the headline of the news article
article_link: link to the original news article. Useful in collecting supplementary data
Reference: https://github.com/rishabhmisra/News-Headlines-Dataset-For-Sarcasm-Detection

• PROJECT OBJECTIVE: Build a sequential NLP classi ier which can use input text parameters to determine the
customer sentiments.
swatikumari295295@gmail.com
Steps and tasks: [ Total Score: 30 Marks]
GZAI7ED0BP

1. Read and explore the data [3 Marks]

2. Retain relevant columns [3 Marks]
3. Get length of each sentence [3 Marks]
4. De ine parameters [3 Marks]
5. Get indices for words [3 Marks]
6. Create features and labels [3 Marks]
7. Get vocabulary size [3 Marks]
8. Create a weight matrix using GloVe embeddings [3 Marks]
9. De ine and compile a Bidirectional LSTM model. [3 Marks]
Hint: Be analytical and experimental here in trying new approaches to design the best model.
10. Fit the model and check the validation accuracy [3 Marks]

This file is meant for personal use by swatikumari295295@gmail.com only.

NLP-2 - Problem Statement
No ratings yet
NLP-2 - Problem Statement
3 pages
NLP - Project 2
No ratings yet
NLP - Project 2
8 pages
Sarcasm Detection in News Headlines
No ratings yet
Sarcasm Detection in News Headlines
7 pages
Detect Sarcastic
No ratings yet
Detect Sarcastic
34 pages
Problem Statement - Project - LSTM
No ratings yet
Problem Statement - Project - LSTM
2 pages
Sentiment Analysis Using NLP
No ratings yet
Sentiment Analysis Using NLP
42 pages
SML 1
No ratings yet
SML 1
16 pages
Machine Learning Assignment Guide
No ratings yet
Machine Learning Assignment Guide
6 pages
Ai Assignment3 Lcs2023007
No ratings yet
Ai Assignment3 Lcs2023007
8 pages
Toxic Comment Classificationusing Bidirectional LSTMand Tensor Flow
No ratings yet
Toxic Comment Classificationusing Bidirectional LSTMand Tensor Flow
35 pages
Hate Speech Detection PPT FINAL
100% (1)
Hate Speech Detection PPT FINAL
29 pages
Fake News Detection
No ratings yet
Fake News Detection
15 pages
CS663-2024-Executive NLP - Assignment Sentiment Analysis
No ratings yet
CS663-2024-Executive NLP - Assignment Sentiment Analysis
4 pages
Sentiment Analysis with LSTM
No ratings yet
Sentiment Analysis with LSTM
38 pages
Complete Report
No ratings yet
Complete Report
56 pages
Detecting Sarcasm in Text - An Obvious Solution To A Trivial Problem
No ratings yet
Detecting Sarcasm in Text - An Obvious Solution To A Trivial Problem
5 pages
Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Sentiment Analysis Using Machine Learning Algorithms
23 pages
Experiment 6
No ratings yet
Experiment 6
3 pages
Set 1
No ratings yet
Set 1
4 pages
NLP - Assignment2 Proper RNN Working
No ratings yet
NLP - Assignment2 Proper RNN Working
3 pages
Wa0012
No ratings yet
Wa0012
8 pages
Malignant Comment Classifier Guide
No ratings yet
Malignant Comment Classifier Guide
30 pages
NLP Exp1
No ratings yet
NLP Exp1
5 pages
Martin, Adrián Rodríguez, Barcelona - 2018 - Toxic Comment Classification Using Convolutional and Recurrent Neural Networks-Annotated
No ratings yet
Martin, Adrián Rodríguez, Barcelona - 2018 - Toxic Comment Classification Using Convolutional and Recurrent Neural Networks-Annotated
4 pages
NM Presentation
No ratings yet
NM Presentation
14 pages
AI-Powered Hate Speech Detection
No ratings yet
AI-Powered Hate Speech Detection
11 pages
AI Report Shivam
No ratings yet
AI Report Shivam
8 pages
Ai Fake News Detection
No ratings yet
Ai Fake News Detection
3 pages
Report in ML
No ratings yet
Report in ML
9 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Twitter Sentiment Analysis Project
No ratings yet
Twitter Sentiment Analysis Project
18 pages
NLP Project (Documentation)
No ratings yet
NLP Project (Documentation)
8 pages
Batch 17
No ratings yet
Batch 17
27 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
TDS 2025 Jan GA3 - Large Language Models
No ratings yet
TDS 2025 Jan GA3 - Large Language Models
38 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
11 pages
ML Projrct Article 2
No ratings yet
ML Projrct Article 2
6 pages
Twitter Sentiment Analysis System
No ratings yet
Twitter Sentiment Analysis System
5 pages
10 Sarcastic Twitter Conll 2010
No ratings yet
10 Sarcastic Twitter Conll 2010
10 pages
DeepLearning LabManual24!25!41 52
No ratings yet
DeepLearning LabManual24!25!41 52
12 pages
Assignment 1 Groupwork C0927405 C0928791
No ratings yet
Assignment 1 Groupwork C0927405 C0928791
11 pages
AI Harmful Content
No ratings yet
AI Harmful Content
4 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
IC-RTETM Final Sentiment Analysis
No ratings yet
IC-RTETM Final Sentiment Analysis
13 pages
NLP Presentation
No ratings yet
NLP Presentation
23 pages
Twitter Sentiment Analysis Using Machine Learning Project Report
No ratings yet
Twitter Sentiment Analysis Using Machine Learning Project Report
3 pages
GN 1
No ratings yet
GN 1
4 pages
Natural Language Processing Tasks
No ratings yet
Natural Language Processing Tasks
5 pages
Praveen Phase 3
No ratings yet
Praveen Phase 3
6 pages
Cyber Bullying Detection via NLP
No ratings yet
Cyber Bullying Detection via NLP
30 pages
Btech TechnicalSeminar
No ratings yet
Btech TechnicalSeminar
17 pages
Report On Email Spam
No ratings yet
Report On Email Spam
7 pages
Sentiment Analysis Proposal
No ratings yet
Sentiment Analysis Proposal
7 pages
Sentiment Analysis for Data Scientists
No ratings yet
Sentiment Analysis for Data Scientists
22 pages
Minor Project Report
No ratings yet
Minor Project Report
29 pages
Fake News Project NLP ML DL Explanation
No ratings yet
Fake News Project NLP ML DL Explanation
3 pages
Assignment - 2
No ratings yet
Assignment - 2
5 pages
Fake News Detection With Different Model
No ratings yet
Fake News Detection With Different Model
15 pages
Capstone Project AIML CV1 Report
100% (1)
Capstone Project AIML CV1 Report
24 pages
Rescued Document 1
No ratings yet
Rescued Document 1
27 pages
Tosca Training Course Content
No ratings yet
Tosca Training Course Content
4 pages
50 New Features of Microsoft SQL Server 2008
No ratings yet
50 New Features of Microsoft SQL Server 2008
6 pages
Ma de 101 59138
No ratings yet
Ma de 101 59138
4 pages
Topology - Solutions Sections 51-54
100% (1)
Topology - Solutions Sections 51-54
29 pages
Specs Epc Epcd
No ratings yet
Specs Epc Epcd
4 pages
(Ebook PDF) The Ethics Primer For Public Administrators in Government and Nonprofit Organizations 2nd Edition PDF Download
100% (3)
(Ebook PDF) The Ethics Primer For Public Administrators in Government and Nonprofit Organizations 2nd Edition PDF Download
59 pages
Matplotlib EBOOK
No ratings yet
Matplotlib EBOOK
97 pages
Au B.com Business Statistics
No ratings yet
Au B.com Business Statistics
221 pages
11 03 10 Change - Control
No ratings yet
11 03 10 Change - Control
2 pages
03 Condensate Stabilization
100% (3)
03 Condensate Stabilization
8 pages
2024-PSYCH 10 Student Booklet
No ratings yet
2024-PSYCH 10 Student Booklet
22 pages
Jvin7 3 5
No ratings yet
Jvin7 3 5
4 pages
Achieving Success As A 21st Century Manager
No ratings yet
Achieving Success As A 21st Century Manager
32 pages
Curriculum Vitae, Al-Hayani 040314
No ratings yet
Curriculum Vitae, Al-Hayani 040314
13 pages
Newton vs Leibniz: Calculus Origins
No ratings yet
Newton vs Leibniz: Calculus Origins
27 pages
Tenth Grade Second Term
No ratings yet
Tenth Grade Second Term
2 pages
Examen Final de Ism
No ratings yet
Examen Final de Ism
5 pages
Sample Thesis About Catering Services
100% (3)
Sample Thesis About Catering Services
6 pages
Narrative Seva
No ratings yet
Narrative Seva
10 pages
Professional 19" Uprights Guide
No ratings yet
Professional 19" Uprights Guide
1 page
Instant Download Low Platinum Fuel Cell Technologies Junliang Zhang PDF All Chapters
100% (3)
Instant Download Low Platinum Fuel Cell Technologies Junliang Zhang PDF All Chapters
55 pages
Efficient SNR Estimation in OFDM System
No ratings yet
Efficient SNR Estimation in OFDM System
3 pages
Furnuejace Cleaning
No ratings yet
Furnuejace Cleaning
6 pages
Pile Foundations
No ratings yet
Pile Foundations
31 pages
Smls 1000 Catalogues
No ratings yet
Smls 1000 Catalogues
2 pages
ELECTRIC - AIS 123 Part 3 (L5)
No ratings yet
ELECTRIC - AIS 123 Part 3 (L5)
1 page
English 4b
No ratings yet
English 4b
3 pages
Game Theory for Econ Students
No ratings yet
Game Theory for Econ Students
3 pages
When The Prompting Stops: Exploring Teachers' Work Around The Educational Frailties of Generative AI Tools
No ratings yet
When The Prompting Stops: Exploring Teachers' Work Around The Educational Frailties of Generative AI Tools
15 pages
WSN 100 Questions 5 Models With Answers
No ratings yet
WSN 100 Questions 5 Models With Answers
15 pages
EWM CLASS 12 - Storage Type Search Sequence For Put-Away
No ratings yet
EWM CLASS 12 - Storage Type Search Sequence For Put-Away
6 pages
Essay Male ICPNA 2
No ratings yet
Essay Male ICPNA 2
2 pages

NLP-2 - Problem Statement

Uploaded by

NLP-2 - Problem Statement

Uploaded by

CT2

• DOMAIN: Digital content and entertainment industry

This file is meant for personal use by swatikumari295295@gmail.com only.

• DOMAIN: Social media analytics

1. Read and explore the data [3 Marks]

This file is meant for personal use by swatikumari295295@gmail.com only.

You might also like