0% found this document useful (0 votes)

41 views12 pages

Parts of Speech Tagger

This document summarizes a parts of speech tagger project completed by Akshay Bhoju kothari, Dhanush Shetty, and H.K Nakul as part of an ML internship. It describes extracting features from sentences, training a naive bayes classifier on the feature sets with 85% accuracy, and potential future enhancements like correcting grammar, parsing text, and adding sentiment analysis.

Uploaded by

Nakul hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views12 pages

Parts of Speech Tagger

Uploaded by

Nakul hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Parts of Speech Tagger

1st project on ML internship (xcelerator)

submitted by- Akshay Bhoju kothari

Dhanush Shetty
H.K Nakul
Machine learning

definition-
Machine learning is an application of artiﬁcial intelligence (AI) that provides systems
the ability to automatically learn and improve from experience without being
explicitly programmed.

Applications

1. Virtual Personal Assistants. Siri, Alexa, Google Now are some of the popular examples of virtual personal assistants.
2. Social Media Services.(facebook)
3. Email Spam and Malware Filtering.
4. Online Customer Support.
5. Search Engine Result Refining.
6. Product Recommendations.
Challenges faced-

1. Most of the challenges we faced in extracting the features.

2. During the training phase we got less accuracy .
3. As we are beginner in python coding ,so it felt somewhat difficult.
4. Understanding about parts of speech.
Feature Extraction
def Feature_Extraction(sentence, i): #feature extraction

features = { 'Token': sentence[i],

'first_word': i == 0,

'capitalized':sentence[i][0].upper() == sentence[i][0],

'All_capitalized': sentence[i].upper() == sentence[i],

'numeric': sentence[i].isdigit(),

'prev-word': '' if i == 0 else sentence[i - 1],

'suffix(1)': sentence[i][-1],

'suffix(2)': '' if len(sentence[i]) < 2 else sentence[i][-2:],

'suffix(3)': '' if len(sentence[i]) < 3 else sentence[i][-3:],

'prefix(1)': sentence[i][0]}

return features
How have we solved our problem--

1. we did a lot of research on identifying the proper features.

2. we read materials and referred websites on Xclerator portal about
machine learning and python coding.
3. Referred many websites and online learning platforms like coursera and
NPTEL.
4. we chose proper algorithm to improve efficiency.
5. we did a lot of research in identifying the proper features .
6. we discussed among our group to enhance our knowledge.
Importing and downloading necessary libraries and dataset.

import nltk #importing and downloading necessary libraries and dataset.

nltk.download('brown')

nltk.download('tagsets')

nltk.download('universal_tagset')

from nltk.corpus import brown

lines = brown.sents(categories='news')

feature= []

for sentence in lines:

for i, word in enumerate(sentence):

feature.append((Feature_Extraction(sentence, i)))
tagged_sents = brown.tagged_sents(categories='news', tagset='universal') #to untag all
the sentences which are tagged and the appending it to the featureset

featureset = []

for tagged_sent in tagged_sents:

untagged_sent = nltk.tag.untag(tagged_sent)

for i, (word,tag) in enumerate(tagged_sent):

featureset.append((Feature_Extraction(untagged_sent,i),tag)) #here featureset is

the data which we will be using for training

size = int(len(featureset)*0.1) #using only 10000 of words as total data

train_set, test_set = featureset[size:], featureset[:size] #5000 datas to train and other

5000 to test
Classifier-
classifier = nltk.NaiveBayesClassifier.train(train_set)

Evaluation using accuracy-

classifier.classify(Feature_Extraction(brown.sents()[0], 9,)) #for the word 'of'

print(Feature_Extraction(brown.sents()[0], 9,))

accuracy=nltk.classify.accuracy(classifier, test_set)

print(accuracy) #nearly 85% we are getting

Naive bayes Classifier-
Future Enhancements-
1. we will be able to correct grammatical errors in a sentence.

2. we will able to do chunking and parsing of text

3. this can be also used in chatbots as a part of the model

4. By adding some extra features we can make this model as sentiment

analyser
References-

1.http://www.nltk.org/book/ch06.html#ref-document-classify-all-words.

2 resource available on xclerator portal.

3.https://docs.python.org/3/library/stdtypes.html.(python documentation)
THANK YOU

Neural Language Models & Classifiers Guide
No ratings yet
Neural Language Models & Classifiers Guide
7 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Methodology
No ratings yet
Methodology
9 pages
NLP Text Classification Week4
No ratings yet
NLP Text Classification Week4
26 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
AI Lab Programs
No ratings yet
AI Lab Programs
9 pages
NLP Lab Assignment 8
No ratings yet
NLP Lab Assignment 8
14 pages
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
No ratings yet
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
9 pages
Glove
100% (1)
Glove
10 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
Python
No ratings yet
Python
9 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
NLP Lab Manual for B.E. Students
No ratings yet
NLP Lab Manual for B.E. Students
21 pages
NLP Record300
No ratings yet
NLP Record300
24 pages
NLPAssignment Purna
No ratings yet
NLPAssignment Purna
12 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
Sumati
No ratings yet
Sumati
10 pages
NLP
No ratings yet
NLP
6 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
Python Text Classification Guide
No ratings yet
Python Text Classification Guide
34 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
Python NLP Techniques Guide
No ratings yet
Python NLP Techniques Guide
18 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
A7 Dsbda Sana
No ratings yet
A7 Dsbda Sana
15 pages
Lab 6
No ratings yet
Lab 6
47 pages
Chat Bot
No ratings yet
Chat Bot
10 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
Week 11-14 With Week 1-14
No ratings yet
Week 11-14 With Week 1-14
36 pages
Combine PDF
No ratings yet
Combine PDF
124 pages
Word Embeddings in NLP
No ratings yet
Word Embeddings in NLP
42 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
DeekshikaJadyada AP24LDS11
No ratings yet
DeekshikaJadyada AP24LDS11
6 pages
Natural Language Processing For Hackers
No ratings yet
Natural Language Processing For Hackers
176 pages
DSBDA7
No ratings yet
DSBDA7
5 pages
Group 4 MovieReview
No ratings yet
Group 4 MovieReview
10 pages
Natural Language Processing GPT-2
No ratings yet
Natural Language Processing GPT-2
5 pages
Deep Learning Questions 1701781891
No ratings yet
Deep Learning Questions 1701781891
17 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
DS 7
No ratings yet
DS 7
3 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
NLP Pipeline: Chapter-2
No ratings yet
NLP Pipeline: Chapter-2
171 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
NLP Record
No ratings yet
NLP Record
15 pages
C24064 - NLP - Lab Manual
No ratings yet
C24064 - NLP - Lab Manual
28 pages
Sample
No ratings yet
Sample
8 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
NLP Essentials
No ratings yet
NLP Essentials
22 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
Electrical Impedance Tomography Thesis
100% (2)
Electrical Impedance Tomography Thesis
5 pages
B.Tech EE 2016
No ratings yet
B.Tech EE 2016
117 pages
Data Communications Network DCN Planning Guide PDF
No ratings yet
Data Communications Network DCN Planning Guide PDF
47 pages
Energy Efficiency in Machine Tools
No ratings yet
Energy Efficiency in Machine Tools
12 pages
Personnel Management Case Study 1 and 2
No ratings yet
Personnel Management Case Study 1 and 2
4 pages
SAP PI Interview Questions 2
No ratings yet
SAP PI Interview Questions 2
7 pages
Usermanual Amiko HD8200-8300-8820 Common EN v120203 Web PDF
100% (1)
Usermanual Amiko HD8200-8300-8820 Common EN v120203 Web PDF
46 pages
Low-Error and Efficient Fixed-Width Squarer For Digital Signal Processing Applications
No ratings yet
Low-Error and Efficient Fixed-Width Squarer For Digital Signal Processing Applications
6 pages
Designware Non-Volatile Memory
No ratings yet
Designware Non-Volatile Memory
3 pages
Simulation of A Virtual CPU Executing Mathematical Functions in Python
No ratings yet
Simulation of A Virtual CPU Executing Mathematical Functions in Python
38 pages
Microsoft Tech Internship Guide
No ratings yet
Microsoft Tech Internship Guide
7 pages
Dokumen - Tips - Sugar Cane Juice Extractor Report
No ratings yet
Dokumen - Tips - Sugar Cane Juice Extractor Report
28 pages
Manual Analógico Driver Bardac
No ratings yet
Manual Analógico Driver Bardac
2 pages
Naveen Kumar
No ratings yet
Naveen Kumar
5 pages
M4 M5 M6 M7 Ta2 Panaligan
100% (1)
M4 M5 M6 M7 Ta2 Panaligan
16 pages
WR SNMP Programmers Guide 10.5
No ratings yet
WR SNMP Programmers Guide 10.5
240 pages
Commands Linux
No ratings yet
Commands Linux
4 pages
Chapter 7 Wireless Communication and Mobile Computing ECEg5412
No ratings yet
Chapter 7 Wireless Communication and Mobile Computing ECEg5412
41 pages
Government of Andhra Pradesh: Registration For GATE Online Classes
No ratings yet
Government of Andhra Pradesh: Registration For GATE Online Classes
2 pages
Computer Graphics Insights
No ratings yet
Computer Graphics Insights
14 pages
Zeiss Instruments and Software Compatibility With Microsoft Windows 10 Upgrade Faqs
No ratings yet
Zeiss Instruments and Software Compatibility With Microsoft Windows 10 Upgrade Faqs
1 page
Lab 4
No ratings yet
Lab 4
5 pages
WebCTRL v9.0 Upgrade Guide
No ratings yet
WebCTRL v9.0 Upgrade Guide
10 pages
In Power
No ratings yet
In Power
3 pages
Lorawan Report Final
No ratings yet
Lorawan Report Final
10 pages
Exflow Leaflet
No ratings yet
Exflow Leaflet
4 pages
Log
No ratings yet
Log
10 pages
Premier Plug Valve Spare Parts
No ratings yet
Premier Plug Valve Spare Parts
4 pages
11 Multimedia Media IR
No ratings yet
11 Multimedia Media IR
19 pages
De ZG628T
No ratings yet
De ZG628T
21 pages