[go: up one dir, main page]

0% found this document useful (0 votes)
24 views9 pages

Unit 2 Pos Tagger

Parts of Speech (PoS) tagging is a fundamental task in Natural Language Processing (NLP) that assigns grammatical categories to words, enhancing machine understanding of human language. It is crucial for various applications like machine translation and sentiment analysis, involving processes such as tokenization, language model loading, and linguistic analysis. Different methods of PoS tagging exist, including rule-based, transformation-based, and statistical approaches, each with its own advantages and disadvantages.

Uploaded by

Stella Thanis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views9 pages

Unit 2 Pos Tagger

Parts of Speech (PoS) tagging is a fundamental task in Natural Language Processing (NLP) that assigns grammatical categories to words, enhancing machine understanding of human language. It is crucial for various applications like machine translation and sentiment analysis, involving processes such as tokenization, language model loading, and linguistic analysis. Different methods of PoS tagging exist, including rule-based, transformation-based, and statistical approaches, each with its own advantages and disadvantages.

Uploaded by

Stella Thanis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

POS(Parts-Of-Speech) Tagging in

NLP
Parts of Speech (PoS) tagging is a core task in NLP,
It gives each word a grammatical category such as
nouns, verbs, adjectives and adverbs. Through
better understanding of phrase structure and
semantics, this technique makes it possible for
machines to study human language more
accurately.
PoS tagging is essential in many NLP applications
like machine translation, sentiment analysis and
information retrieval. It serves as a link between
language and machine understanding, enabling the
creation of complex language processing systems.
POS tagging illustration

POS(Parts-Of-Speech) Tagging
Parts of Speech tagging is a linguistic activity
in Natural Language Processing (NLP) wherein each
word in a document is given a particular part of
speech (adverb, adjective, verb etc.) or grammatical
category. Through the addition of a layer of
syntactic and semantic information to the words,
this procedure makes it easier to understand the
sentence's structure and meaning.
In NLP applications, POS tagging is useful
for machine translation, named entity
recognition and information extraction, among other
things. It also works well for clearing out ambiguity
in terms with numerous meanings and revealing a
sentence's grammatical structure.
Example of POS Tagging
Consider the sentence: "The quick brown fox jumps
over the lazy dog."
After performing POS Tagging:
 "The" is tagged as determiner (DT)
 "quick" is tagged as adjective (JJ)
 "brown" is tagged as adjective (JJ)
 "fox" is tagged as noun (NN)
 "jumps" is tagged as verb (VBZ)
 "over" is tagged as preposition (IN)
 "the" is tagged as determiner (DT)
 "lazy" is tagged as adjective (JJ)
 "dog" is tagged as noun (NN)

By offering insights into the grammatical structure,


this tagging helps machines in understanding not
just individual words but also the connections
between them inside a phrase. For many NLP
applications like text summarization, sentiment
analysis, this kind of data is essential.
Workflow of POS Tagging in NLP
 Tokenization: The input text is divided into
individual tokens, representing words or
subwords. Tokenization is the foundational step in
most NLP tasks which enables further analysis at
the word level.
 Loading a Language Model: Tools
like NLTK or SpaCy requires a pre-trained
language model to perform POS tagging. These
models are trained on large datasets and provide
insights into the grammatical rules and structure
of the language.
 Text Preprocessing: The text is then cleaned to
improve accuracy. Common preprocessing steps
include converting text to lowercase, removing
xspecial characters and eliminating irrelevant
content.
 Linguistic Analysis: This stage involves parsing
the sentence to understand the grammatical role
of each token. It lays the groundwork for
assigning the appropriate part of speech by
interpreting the sentence’s syntactic structure.
 POS Tagging: Each token is then assigned a
specific part-of-speech label. This is based on its
role in the sentence and contextual clues
provided by surrounding words.
 Result Evaluation: Finally, the POS-tagged
output is reviewed to ensure accuracy. Any
misclassifications or anomalies are identified and
corrected as needed.
Implementation of Parts-of-Speech
tagging using NLTK
1. Installing packages

import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
2. Implementation
 The sentence is stored in the variable text.
 The text is tokenized into words using
word_tokenize(text) before applying POS tagging.
 pos_tag(words) assigns grammatical tags (e.g.,
noun, verb) to each word.
 The original sentence is printed for reference.
 A loop prints each word alongside its predicted
part-of-speech tag.
 Let me know if you want to add output
interpretation too!

# Sample text
text = "NLTK is a powerful library for natural language
processing."

# Tokenize the text


words = word_tokenize(text)

# Performing PoS tagging


pos_tags = pos_tag(words)

print("Original Text:")
print(text)

print("\nPoS Tagging Result:")


for word, pos_tag in pos_tags:
print(f"{word}: {pos_tag}")
Output:
POS using NLTK
Implementation of Parts-of-Speech
tagging using Spacy
Installing Packages

!pip install spacy


!python -m spacy download en_core_web_sm
Implementation
 Imports the SpaCy library.
 Loads the pre-trained English language model
en_core_web_sm.
 Defines a sample sentence in the variable text.
 Processes the text using nlp(text), which returns
a object containing linguistic annotations.
 Prints the original sentence for reference.
 Iterates through each token in the doc and prints
the word along with its part-of-speech (POS) tag
using token.text and token.pos_.

#importing libraries
import spacy

# Load the English language model


nlp = spacy.load("en_core_web_sm")

# Sample text
text = "SpaCy is a popular natural language processing
library."

# Process the text with SpaCy


doc = nlp(text)

print("Original Text: ", text)


print("PoS Tagging Result:")
for token in doc:
print(f"{token.text}: {token.pos_}")
Output:
POS using Spacy
Types of POS Tagging in NLP
Assigning grammatical categories to words in a text
is known as Part-of-Speech (PoS) tagging and it is an
essential aspect of Natural Language Processing
(NLP). Different PoS tagging approaches exist, each
with a unique methodology. Here are a few typical
kinds:
1. Rule-Based Tagging
Rule-based POS tagging assigns grammatical tags
to words using a predefined set of rules, as opposed
to machine learning-based methods that require
training on annotated corpora. These rules are
crafted based on morphological features (like word
endings) and syntactic context, making the
approach highly interpretable and transparent.
Example
a rule might specify that words ending in “-tion” or
“-ment” should be tagged as nouns, based on
common suffix patterns found in English.
 Rule: Assign the POS tag "Noun" to words ending
in -tion or -ment.
 Text: "The presentation highlighted the key
achievements of the project's development."
Tagged Output:
 "The" : Determiner (DET)
 "presentation" : Noun (N)
 "highlighted" : Verb (V)
 "the" : Determiner (DET)
 "key" : Adjective (ADJ)
 "achievements" : Noun (N)
 "of" : Preposition (PREP)
 "the" : Determiner (DET)
 "project's" : Noun (N)
 "development" : Noun (N)
In this case, the rule-based tagger correctly
identifies "presentation," "achievements," and
"development" as nouns by applying suffix-based
rule. While simple, this example illustrates how rule-
based systems can handle a wide range of linguistic
patterns using structured, interpretable logic.
2. Transformation Based tagging
Transformation-Based Tagging (TBT) is a method for
refining POS tags through a series of context-based
transformations. Unlike statistical taggers that rely
on probabilities or rule-based taggers that apply
static rules, TBT starts with initial tags and improves
them iteratively by applying transformation rules.
Example
a rule might state: “Change a word’s tag from
Verb to Noun if it follows a determiner like
‘the’.”
 Text: "The cat chased the mouse."
 Initial Tags: "The" – DET, "cat" – N, "chased" – V,
"the" – DET, "mouse" – N
 Transformation Rule Applied: Change
“chased” from Verb to Noun because it follows
“the”.
 Updated Tags: "chased" becomes Noun.
3. Statistical POS Tagging
Statistical POS tagging is a computational linguistics
approach that uses probabilistic models to assign
grammatical categories (e.g., noun, verb, adjective)
to words in a text. Unlike rule-based methods, which
rely on handcrafted rules, statistical tagging learns
patterns from large annotated corpora using
machine learning techniques.
These models estimate the probability of a tag given
a word and its context, enabling them to resolve
linguistic ambiguities and adapt to complex
grammatical structures. Popular models include:
 Hidden Markov Models (HMMs)
 Conditional Random Fields (CRFs)

Advantages of POS tagging


Advantages Description

Helps deconstruct complex sentences for easier


Text Simplification
understanding.

Improved Information Enables more accurate indexing and searching based on


Retrieval grammatical categories.

Named Entity Serves as a precursor for identifying names, places and


Recognition (NER) organizations.

Assists in analyzing sentence structure and word


Syntactic Parsing
relationships.

Disadvantages of POS Tagging


Disadvantages Description

Words may have multiple meanings depending on


Ambiguity
context.
Disadvantages Description

Informal or non-standard phrases are hard to tag


Idiomatic Expressions
correctly.

Out-of-Vocabulary
Unseen words can lead to incorrect tagging.
Words

Models may not generalize well outside their training


Domain Dependence
domain.

You might also like