0% found this document useful (0 votes)

29 views7 pages

Solution NLP UT1

The document outlines the structure and content of a test examination for a Natural Language Processing (NLP) course at Shivajirao S. Jondhale College of Engineering. It includes questions on various NLP topics such as the stages of the NLP process, language models, morphology, and part-of-speech tagging. Additionally, it provides sample answers and explanations for key concepts in NLP, including tokenization, morphological analysis, and bigram probabilities.

Uploaded by

ssjcoe.deptaiml

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views7 pages

Solution NLP UT1

Uploaded by

ssjcoe.deptaiml

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Shivajirao S.

Jondhale College of Engineering

Department of Artificial Intelligence and Machine Learning

TEST EXAMINATION - I
(Internal Assessment I) (August -2024)
Academic Year 2024 - 2025

Class TE Sem : VII Date :22 /08/2024

Time: 1Hr Subject: NLP (CSD07011) Max. Marks: 20

===========================================================
Q. No. ATTEMPT ANY ONE Marks CO BL

Q1 What is NLP? Discuss various stages involved in NLP process with suitable example. 5 CO1 BL2
OR 5 CO1 BL1
A) Write Short Note on Derivational and Inflectional Morphology

Q.2 What is Language Model? What is N-grams language Model and discuss types of n-grams 5 CO2 BL2
A) OR 5 CO2 BL2
What is Morphology? What do we need to do morphology analysis?
B)

Q.3 10 CO3 BL3

A) 10 CO3 BL2
Explain types of POS Tagging Different issues in tagging.
OR
B)
Corpus Data:
<s> I am a human</s>
<s>I am not a stone</s>
<s> I I Live in Mumbai</s>
Check Probability of <S> I I am not</s> using Bigram

NLP SEM VII

UT 1 SOLUTION

A) What is NLP? Discuss various stages involved in NLP process with suitable example.
Natural Language Processing (NLP) is a field within artificial intelligence that allows computers to comprehend, analyze, and interact with human language effectively. The
process of NLP can be divided into five distinct phases: Lexical Analysis, Syntactic Analysis, Semantic Analysis, Discourse Integration, and Pragmatic Analysis. Each phase
plays a crucial role in the overall understanding and processing of natural language.
First Phase of NLP: Lexical and Morphological Analysis
Tokenization
The lexical phase in Natural Language Processing (NLP) involves scanning text and breaking it down into smaller units such as paragraphs, sentences, and words. This
process, known as tokenization, converts raw text into manageable units called tokens or lexemes. Tokenization is essential for understanding and processing text at the word
level.

In addition to tokenization, various data cleaning and feature extraction techniques are applied, including:

Lemmatization: Reducing words to their base or root form.

Stopwords Removal: Eliminating common words that do not carry significant meaning, such as "and," "the," and "is."
Correcting Misspelled Words: Ensuring the text is free of spelling errors to maintain accuracy.
These steps enhance the comprehensibility of the text, making it easier to analyze and process.

Morphological Analysis
Morphological analysis is another critical phase in NLP, focusing on identifying morphemes, the smallest units of a word that carry meaning and cannot be further divided.
Understanding morphemes is vital for grasping the structure of words and their relationships.

Types of Morphemes
Free Morphemes: Text elements that carry meaning independently and make sense on their own. For example, "bat" is a free morpheme.
Bound Morphemes: Elements that must be attached to free morphemes to convey meaning, as they cannot stand alone. For instance, the suffix "-ing" is a bound morpheme,
needing to be attached to a free morpheme like "run" to form "running."
Importance of Morphological Analysis
Morphological analysis is crucial in NLP for several reasons:

Understanding Word Structure: It helps in deciphering the composition of complex words.

Predicting Word Forms: It aids in anticipating different forms of a word based on its morphemes.
Improving Accuracy: It enhances the accuracy of tasks such as part-of-speech tagging, syntactic parsing, and machine translation.
By identifying and analyzing morphemes, the system can interpret text correctly at the most fundamental level, laying the groundwork for more advanced NLP applications.

Second Phase of NLP: Syntactic Analysis (Parsing)

Syntactic analysis, also known as parsing, is the second phase of Natural Language Processing (NLP). This phase is essential for understanding the structure of a sentence
and assessing its grammatical correctness. It involves analyzing the relationships between words and ensuring their logical consistency by comparing their arrangement
against standard grammatical rules.

Q! B )Write Short Note on Derivational and Inflectional Morphology

Derivational morphology refers to the process of creating new words by adding affixes (prefixes or suffixes)
to a base word, often changing the word's part of speech or meaning significantly, while inflectional
morphology involves adding affixes to indicate grammatical features like tense, number, or case, without
changing the word's core meaning or part of speech.

Key Differences:
Function:
Derivational morphology creates new words with potentially different meanings and word classes, while
inflectional morphology modifies existing words to fit grammatical context.

Meaning Change:
Derivational affixes often add substantial meaning to the base word, while inflectional affixes primarily
add grammatical information.
Examples:

Derivational:
"act" becomes "actor" (adding "-or" changes the word from a verb to a noun), "happy" becomes
"unhappy" (adding "un-" changes the meaning to negative).

Inflectional:

"cat" becomes "cats" (adding "-s" indicates plural), "walk" becomes "walked" (adding "-ed" marks past
tense).

Important Points:
Order of Application:

When a word has both derivational and inflectional affixes, the derivational affix is always added closer
to the base word, followed by the inflectional affix.

Productivity:

While some derivational affixes may be more productive than others, inflectional affixes are usually
highly productive and can be applied to most words within their category.
Q.2 What is Language Model? What is N-grams language Model and discuss types of n-grams
A) A Language Model (LM) is a probabilistic model that predicts the likelihood of a sequence of words or
phrases, while an N-gram language model is a specific type of LM that estimates this probability based
on the preceding n-1 words.

Here's a more detailed explanation:

1. Language Model (LM):

Definition:
A LM is a machine learning model designed to understand and generate human language.

Function:
It learns the statistical properties of language, including word order, grammar, and semantics, from a
large corpus of text.
Goal:
To predict the probability of a sequence of words or phrases, often represented as P(w1, w2, ..., wn),
where w represents words.

Applications:

Speech recognition, text generation, machine translation, and more.

2.N-gram Language Model:
Definition:
An N-gram model is a type of LM that makes the assumption that the probability of a word depends
only on the preceding n-1 words.
How it works:
It models the probability of a word given its context (the previous n-1 words).
Example:
In a bigram model (n=2), the probability of the word "cat" given the word "The" is estimated based
on the frequency of the bigram "The cat" in the training data.
Advantages:

Relatively simple to implement and train, and can be effective for certain NLP tasks.

Disadvantages:

Limited ability to capture long-range dependencies in language, and can struggle with unseen or rare
n-grams.

3. Types of N-grams:
Unigram (n=1): Considers each word independently, ignoring context.
Bigram (n=2): Considers pairs of consecutive words.
Trigram (n=3): Considers sequences of three consecutive words.
N-gram (n>3): Considers sequences of n consecutive words.
OR
B)
What is Morphology? What do we need to do morphology analysis?

Morphology, a branch of linguistics, studies the internal structure of words and how they are formed,
focusing on morphemes, the smallest meaningful units of language. Morphological analysis
involves identifying and analyzing these morphemes to understand word formation and meaning.

What is Morphology?

Morphology is the study of words and how they are formed.

It examines the internal structure of words, including how morphemes combine to create new words or different forms of existing words.

A morpheme is the smallest unit of language that carries meaning, such as prefixes, suffixes, or root words.

What do we need to do morphology analysis?

Identify morphemes: Break down words into their constituent morphemes (e.g., prefixes, suffixes, root words).
Analyze word formation: Understand how morphemes combine to form new words or different forms of existing words.
Determine meaning: Analyze how morphemes contribute to the overall meaning of a word.
Apply to various fields: Morphological analysis has applications in areas like natural language processing, machine translation, and information retrieval.

Examples:
In the word "unbreakable", the morphemes are "un-" (prefix), "break" (root), and "-able" (suffix).
In the word "dogs", the morphemes are "dog" (root) and "-s" (plural marker).

Why is Morphology important?

It helps us understand how languages work and how words are formed.
It is crucial for natural language processing tasks, such as machine translation and text analysis.
It can help learners of a new language to understand the structure of words and how they are formed.

Q.3
A)
Explain types of POS Tagging Different issues in tagging.
POS (Part-of-Speech) tagging involves assigning grammatical tags (like noun, verb, adjective) to words
in a sentence, and there are different approaches to this, including rule-based, stochastic, and
transformation-based tagging, each with its own advantages and disadvantages.

Types of POS Tagging:

Rule-Based Tagging:
This method uses predefined rules based on linguistic knowledge to assign tags to words.
Pros: Simple to implement and understand.
Cons: Can struggle with ambiguity and complex grammatical structures.
Stochastic Tagging:
This approach uses probabilistic models to determine the most likely tag for a word based on its
context.
Pros: Can handle ambiguity and complex structures better than rule-based methods.
Cons: Requires large amounts of labeled training data.

Transformation-Based Tagging (Brill Tagging):

This method uses a set of transformation rules to iteratively improve the accuracy of the initial tags.

Pros: Can be more accurate than rule-based tagging.

Cons: Can be computationally expensive.
OR
Corpus Data:
<s> I am a human</s>
<s>I am not a stone</s>
<s> I I Live in Mumbai</s>
Check Probability of <S> I I am not</s> using Bigram

Prepare the Corpus for Bigram Counts

First, we need to extract the bigrams from the given corpus:

 "<s> I am a human </s>" -> (<s>, I), (I, am), (am, a), (a, human), (human, </s>)
 "<s> I am not a stone </s>" -> (<s>, I), (I, am), (am, not), (not, a), (a, stone), (stone, </s>)
 "<s> I Live in Mumbai </s>" -> (<s>, I), (I, Live), (Live, in), (in, Mumbai), (Mumbai, </s>)

B)
2. Count the Bigrams

Now, let's count the occurrences of each bigram:

 (<s>, I): 3
 (I, am): 2
 (am, a): 1
 (a, human): 1
 (human, </s>): 1
 (am, not): 1
 (not, a): 1
 (a, stone): 1
 (stone, </s>): 1
 (I, Live): 1
 (Live, in): 1
 (in, Mumbai): 1
 (Mumbai, </s>): 1

3. Count the Unigrams (First Words of Bigrams)

We also need the counts of the first words in each bigram:

 <s>: 3
 I: 3
 am: 2
 a: 2
 human: 1
 not: 1
 stone: 1
 Live: 1
 in: 1
 Mumbai: 1

4. Calculate Bigram Probabilities

The probability of a bigram (w2 | w1) is calculated as:

P(w2∣w1)=count(w1)count(w1,w2)

Let's calculate the probabilities for the bigrams in the sentence "<s> I I am not </s>":

 P(I | <s>) = count(<s>, I) / count(<s>) = 3 / 3 = 1

 P(I | I) = count(I, I) / count(I) = 0 / 3 = 0 (because "I I" does not exist in the training corpus.)
 P(am | I) = count(I, am) / count(I) = 2 / 3
 P(not | am) = count(am, not) / count(am) = 1 / 2
 P(</s> | not) = count(not, </s>) / count(not) = 0 / 1 = 0 (because "not </s>" does not exist in the training corpus.)

5. Calculate the Sentence Probability

The probability of the sentence is the product of the bigram probabilities:

P(<s>IIamnot</s>)=P(I∣<s>)∗P(I∣I)∗P(am∣I)∗P(not∣am)∗P(</s>∣not)

P(<s>IIamnot</s>)=1∗0∗(2/3)∗(1/2)∗0=0

Conclusion

The probability of the sentence "<s> I I am not </s>" using the bigram model is 0. This is because the
bigram "I I" and "not </s>" were not observed in the training data, leading to zero probabilities for those
bigrams, and therefore making the entire sentence probability zero.

NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP Study Notes
No ratings yet
NLP Study Notes
13 pages
1 Intro To NLP
100% (1)
1 Intro To NLP
46 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
Lecture Template 16x9
No ratings yet
Lecture Template 16x9
16 pages
Grapheme:: Morpheme
No ratings yet
Grapheme:: Morpheme
20 pages
Selected Topic CH 1
No ratings yet
Selected Topic CH 1
36 pages
NLP IAT I Solution
No ratings yet
NLP IAT I Solution
8 pages
Module 1
No ratings yet
Module 1
40 pages
NLP Unit 2
No ratings yet
NLP Unit 2
48 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Introduction To Natural Language Processing: Unit 1
No ratings yet
Introduction To Natural Language Processing: Unit 1
60 pages
NLP CSM
No ratings yet
NLP CSM
136 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
Natural Language Processing
No ratings yet
Natural Language Processing
72 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
NLP Notes
No ratings yet
NLP Notes
56 pages
Natural Language Processing
No ratings yet
Natural Language Processing
20 pages
Unit 12 (3 Half)
No ratings yet
Unit 12 (3 Half)
37 pages
NLPNotes
No ratings yet
NLPNotes
12 pages
(A) What Is Traditional Model of NLP?: Unit - 1
No ratings yet
(A) What Is Traditional Model of NLP?: Unit - 1
18 pages
Module-1 - Introduction To NLP
No ratings yet
Module-1 - Introduction To NLP
39 pages
NLP m2
No ratings yet
NLP m2
71 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
What Is NLP?: Components of An FSA
No ratings yet
What Is NLP?: Components of An FSA
16 pages
NLP Lect 2 Words and Morphology
No ratings yet
NLP Lect 2 Words and Morphology
52 pages
NLP
No ratings yet
NLP
8 pages
Natural Language Processing Module 1 Notes PDF
100% (3)
Natural Language Processing Module 1 Notes PDF
15 pages
Module1 Chapter1
No ratings yet
Module1 Chapter1
23 pages
NLP Mid-1
No ratings yet
NLP Mid-1
15 pages
SemVII NaturalLanguageProcessing
No ratings yet
SemVII NaturalLanguageProcessing
32 pages
NLP QB
No ratings yet
NLP QB
13 pages
Lec 1.1.2
No ratings yet
Lec 1.1.2
44 pages
NLP Unit-I Notes
No ratings yet
NLP Unit-I Notes
19 pages
NLP Pipeline and Morphology
No ratings yet
NLP Pipeline and Morphology
21 pages
CB3591 - Engineering Ssecure Software Systems - Notes
No ratings yet
CB3591 - Engineering Ssecure Software Systems - Notes
50 pages
Introduction To NLP - Chap1
No ratings yet
Introduction To NLP - Chap1
47 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
45 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
NLP Notes (Ch-1)
No ratings yet
NLP Notes (Ch-1)
5 pages
NLP Notes (Ch1-5) PDF
100% (1)
NLP Notes (Ch1-5) PDF
41 pages
Week 12 Topic 8 NLP
No ratings yet
Week 12 Topic 8 NLP
31 pages
Unit Ii NLP Notes Final
No ratings yet
Unit Ii NLP Notes Final
6 pages
404-BA-Chapter V
No ratings yet
404-BA-Chapter V
22 pages
UNIT-1 Notes
No ratings yet
UNIT-1 Notes
19 pages
NLP Notes
No ratings yet
NLP Notes
43 pages
Notes
No ratings yet
Notes
9 pages
NLP Notes
No ratings yet
NLP Notes
26 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
69 pages
Module 1 Part1 NLP
No ratings yet
Module 1 Part1 NLP
24 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
NLP PPT
No ratings yet
NLP PPT
41 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
162 pages
NLP - Introduction
No ratings yet
NLP - Introduction
7 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages

Solution NLP UT1

Uploaded by

Solution NLP UT1

Uploaded by

Shivajirao S.

Jondhale College of Engineering

Department of Artificial Intelligence and Machine Learning

Class TE Sem : VII Date :22 /08/2024

Time: 1Hr Subject: NLP (CSD07011) Max. Marks: 20

Q.3 10 CO3 BL3

NLP SEM VII

Lemmatization: Reducing words to their base or root form.

Understanding Word Structure: It helps in deciphering the composition of complex words.

Second Phase of NLP: Syntactic Analysis (Parsing)

Q! B )Write Short Note on Derivational and Inflectional Morphology

Here's a more detailed explanation:

1. Language Model (LM):

Speech recognition, text generation, machine translation, and more.

Morphology is the study of words and how they are formed.

What do we need to do morphology analysis?

Why is Morphology important?

Types of POS Tagging:

Transformation-Based Tagging (Brill Tagging):

Pros: Can be more accurate than rule-based tagging.

Prepare the Corpus for Bigram Counts

First, we need to extract the bigrams from the given corpus:

Now, let's count the occurrences of each bigram:

3. Count the Unigrams (First Words of Bigrams)

We also need the counts of the first words in each bigram:

4. Calculate Bigram Probabilities

The probability of a bigram (w2 | w1) is calculated as:

 P(I | <s>) = count(<s>, I) / count(<s>) = 3 / 3 = 1

5. Calculate the Sentence Probability

The probability of the sentence is the product of the bigram probabilities:

You might also like