[go: up one dir, main page]

0% found this document useful (0 votes)
29 views7 pages

Solution NLP UT1

The document outlines the structure and content of a test examination for a Natural Language Processing (NLP) course at Shivajirao S. Jondhale College of Engineering. It includes questions on various NLP topics such as the stages of the NLP process, language models, morphology, and part-of-speech tagging. Additionally, it provides sample answers and explanations for key concepts in NLP, including tokenization, morphological analysis, and bigram probabilities.

Uploaded by

ssjcoe.deptaiml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views7 pages

Solution NLP UT1

The document outlines the structure and content of a test examination for a Natural Language Processing (NLP) course at Shivajirao S. Jondhale College of Engineering. It includes questions on various NLP topics such as the stages of the NLP process, language models, morphology, and part-of-speech tagging. Additionally, it provides sample answers and explanations for key concepts in NLP, including tokenization, morphological analysis, and bigram probabilities.

Uploaded by

ssjcoe.deptaiml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Shivajirao S.

Jondhale College of Engineering

Department of Artificial Intelligence and Machine Learning


TEST EXAMINATION - I
(Internal Assessment I) (August -2024)
Academic Year 2024 - 2025

Class TE Sem : VII Date :22 /08/2024

Time: 1Hr Subject: NLP (CSD07011) Max. Marks: 20

===========================================================
Q. No. ATTEMPT ANY ONE Marks CO BL

Q1 What is NLP? Discuss various stages involved in NLP process with suitable example. 5 CO1 BL2
OR 5 CO1 BL1
A) Write Short Note on Derivational and Inflectional Morphology

B)

Q.2 What is Language Model? What is N-grams language Model and discuss types of n-grams 5 CO2 BL2
A) OR 5 CO2 BL2
What is Morphology? What do we need to do morphology analysis?
B)

Q.3 10 CO3 BL3


A) 10 CO3 BL2
Explain types of POS Tagging Different issues in tagging.
OR
B)
Corpus Data:
<s> I am a human</s>
<s>I am not a stone</s>
<s> I I Live in Mumbai</s>
Check Probability of <S> I I am not</s> using Bigram

NLP SEM VII


UT 1 SOLUTION

A) What is NLP? Discuss various stages involved in NLP process with suitable example.
Natural Language Processing (NLP) is a field within artificial intelligence that allows computers to comprehend, analyze, and interact with human language effectively. The
process of NLP can be divided into five distinct phases: Lexical Analysis, Syntactic Analysis, Semantic Analysis, Discourse Integration, and Pragmatic Analysis. Each phase
plays a crucial role in the overall understanding and processing of natural language.
First Phase of NLP: Lexical and Morphological Analysis
Tokenization
The lexical phase in Natural Language Processing (NLP) involves scanning text and breaking it down into smaller units such as paragraphs, sentences, and words. This
process, known as tokenization, converts raw text into manageable units called tokens or lexemes. Tokenization is essential for understanding and processing text at the word
level.

In addition to tokenization, various data cleaning and feature extraction techniques are applied, including:

Lemmatization: Reducing words to their base or root form.


Stopwords Removal: Eliminating common words that do not carry significant meaning, such as "and," "the," and "is."
Correcting Misspelled Words: Ensuring the text is free of spelling errors to maintain accuracy.
These steps enhance the comprehensibility of the text, making it easier to analyze and process.

Morphological Analysis
Morphological analysis is another critical phase in NLP, focusing on identifying morphemes, the smallest units of a word that carry meaning and cannot be further divided.
Understanding morphemes is vital for grasping the structure of words and their relationships.

Types of Morphemes
Free Morphemes: Text elements that carry meaning independently and make sense on their own. For example, "bat" is a free morpheme.
Bound Morphemes: Elements that must be attached to free morphemes to convey meaning, as they cannot stand alone. For instance, the suffix "-ing" is a bound morpheme,
needing to be attached to a free morpheme like "run" to form "running."
Importance of Morphological Analysis
Morphological analysis is crucial in NLP for several reasons:

Understanding Word Structure: It helps in deciphering the composition of complex words.


Predicting Word Forms: It aids in anticipating different forms of a word based on its morphemes.
Improving Accuracy: It enhances the accuracy of tasks such as part-of-speech tagging, syntactic parsing, and machine translation.
By identifying and analyzing morphemes, the system can interpret text correctly at the most fundamental level, laying the groundwork for more advanced NLP applications.

Second Phase of NLP: Syntactic Analysis (Parsing)


Syntactic analysis, also known as parsing, is the second phase of Natural Language Processing (NLP). This phase is essential for understanding the structure of a sentence
and assessing its grammatical correctness. It involves analyzing the relationships between words and ensuring their logical consistency by comparing their arrangement
against standard grammatical rules.

Q! B )Write Short Note on Derivational and Inflectional Morphology

Derivational morphology refers to the process of creating new words by adding affixes (prefixes or suffixes)
to a base word, often changing the word's part of speech or meaning significantly, while inflectional
morphology involves adding affixes to indicate grammatical features like tense, number, or case, without
changing the word's core meaning or part of speech.

Key Differences:
Function:
Derivational morphology creates new words with potentially different meanings and word classes, while
inflectional morphology modifies existing words to fit grammatical context.

Meaning Change:
Derivational affixes often add substantial meaning to the base word, while inflectional affixes primarily
add grammatical information.
Examples:

Derivational:
"act" becomes "actor" (adding "-or" changes the word from a verb to a noun), "happy" becomes
"unhappy" (adding "un-" changes the meaning to negative).

Inflectional:

"cat" becomes "cats" (adding "-s" indicates plural), "walk" becomes "walked" (adding "-ed" marks past
tense).

Important Points:
Order of Application:

When a word has both derivational and inflectional affixes, the derivational affix is always added closer
to the base word, followed by the inflectional affix.

Productivity:

While some derivational affixes may be more productive than others, inflectional affixes are usually
highly productive and can be applied to most words within their category.
Q.2 What is Language Model? What is N-grams language Model and discuss types of n-grams
A) A Language Model (LM) is a probabilistic model that predicts the likelihood of a sequence of words or
phrases, while an N-gram language model is a specific type of LM that estimates this probability based
on the preceding n-1 words.

Here's a more detailed explanation:

1. Language Model (LM):


Definition:
A LM is a machine learning model designed to understand and generate human language.

Function:
It learns the statistical properties of language, including word order, grammar, and semantics, from a
large corpus of text.
Goal:
To predict the probability of a sequence of words or phrases, often represented as P(w1, w2, ..., wn),
where w represents words.

Applications:

Speech recognition, text generation, machine translation, and more.


2.N-gram Language Model:
Definition:
An N-gram model is a type of LM that makes the assumption that the probability of a word depends
only on the preceding n-1 words.
How it works:
It models the probability of a word given its context (the previous n-1 words).
Example:
In a bigram model (n=2), the probability of the word "cat" given the word "The" is estimated based
on the frequency of the bigram "The cat" in the training data.
Advantages:

Relatively simple to implement and train, and can be effective for certain NLP tasks.

Disadvantages:

Limited ability to capture long-range dependencies in language, and can struggle with unseen or rare
n-grams.

3. Types of N-grams:
Unigram (n=1): Considers each word independently, ignoring context.
Bigram (n=2): Considers pairs of consecutive words.
Trigram (n=3): Considers sequences of three consecutive words.
N-gram (n>3): Considers sequences of n consecutive words.
OR
B)
What is Morphology? What do we need to do morphology analysis?

Morphology, a branch of linguistics, studies the internal structure of words and how they are formed,
focusing on morphemes, the smallest meaningful units of language. Morphological analysis
involves identifying and analyzing these morphemes to understand word formation and meaning.

What is Morphology?

Morphology is the study of words and how they are formed.

It examines the internal structure of words, including how morphemes combine to create new words or different forms of existing words.

A morpheme is the smallest unit of language that carries meaning, such as prefixes, suffixes, or root words.

What do we need to do morphology analysis?


Identify morphemes: Break down words into their constituent morphemes (e.g., prefixes, suffixes, root words).
Analyze word formation: Understand how morphemes combine to form new words or different forms of existing words.
Determine meaning: Analyze how morphemes contribute to the overall meaning of a word.
Apply to various fields: Morphological analysis has applications in areas like natural language processing, machine translation, and information retrieval.

Examples:
In the word "unbreakable", the morphemes are "un-" (prefix), "break" (root), and "-able" (suffix).
In the word "dogs", the morphemes are "dog" (root) and "-s" (plural marker).

Why is Morphology important?


It helps us understand how languages work and how words are formed.
It is crucial for natural language processing tasks, such as machine translation and text analysis.
It can help learners of a new language to understand the structure of words and how they are formed.

Q.3
A)
Explain types of POS Tagging Different issues in tagging.
POS (Part-of-Speech) tagging involves assigning grammatical tags (like noun, verb, adjective) to words
in a sentence, and there are different approaches to this, including rule-based, stochastic, and
transformation-based tagging, each with its own advantages and disadvantages.

Types of POS Tagging:


Rule-Based Tagging:
This method uses predefined rules based on linguistic knowledge to assign tags to words.
Pros: Simple to implement and understand.
Cons: Can struggle with ambiguity and complex grammatical structures.
Stochastic Tagging:
This approach uses probabilistic models to determine the most likely tag for a word based on its
context.
Pros: Can handle ambiguity and complex structures better than rule-based methods.
Cons: Requires large amounts of labeled training data.

Transformation-Based Tagging (Brill Tagging):

This method uses a set of transformation rules to iteratively improve the accuracy of the initial tags.

Pros: Can be more accurate than rule-based tagging.


Cons: Can be computationally expensive.
OR
Corpus Data:
<s> I am a human</s>
<s>I am not a stone</s>
<s> I I Live in Mumbai</s>
Check Probability of <S> I I am not</s> using Bigram

Prepare the Corpus for Bigram Counts

First, we need to extract the bigrams from the given corpus:

 "<s> I am a human </s>" -> (<s>, I), (I, am), (am, a), (a, human), (human, </s>)
 "<s> I am not a stone </s>" -> (<s>, I), (I, am), (am, not), (not, a), (a, stone), (stone, </s>)
 "<s> I Live in Mumbai </s>" -> (<s>, I), (I, Live), (Live, in), (in, Mumbai), (Mumbai, </s>)

B)
2. Count the Bigrams

Now, let's count the occurrences of each bigram:

 (<s>, I): 3
 (I, am): 2
 (am, a): 1
 (a, human): 1
 (human, </s>): 1
 (am, not): 1
 (not, a): 1
 (a, stone): 1
 (stone, </s>): 1
 (I, Live): 1
 (Live, in): 1
 (in, Mumbai): 1
 (Mumbai, </s>): 1

3. Count the Unigrams (First Words of Bigrams)

We also need the counts of the first words in each bigram:

 <s>: 3
 I: 3
 am: 2
 a: 2
 human: 1
 not: 1
 stone: 1
 Live: 1
 in: 1
 Mumbai: 1

4. Calculate Bigram Probabilities

The probability of a bigram (w2 | w1) is calculated as:

P(w2∣w1)=count(w1)count(w1,w2)

Let's calculate the probabilities for the bigrams in the sentence "<s> I I am not </s>":

 P(I | <s>) = count(<s>, I) / count(<s>) = 3 / 3 = 1


 P(I | I) = count(I, I) / count(I) = 0 / 3 = 0 (because "I I" does not exist in the training corpus.)
 P(am | I) = count(I, am) / count(I) = 2 / 3
 P(not | am) = count(am, not) / count(am) = 1 / 2
 P(</s> | not) = count(not, </s>) / count(not) = 0 / 1 = 0 (because "not </s>" does not exist in the training corpus.)

5. Calculate the Sentence Probability

The probability of the sentence is the product of the bigram probabilities:

P(<s>IIamnot</s>)=P(I∣<s>)∗P(I∣I)∗P(am∣I)∗P(not∣am)∗P(</s>∣not)

P(<s>IIamnot</s>)=1∗0∗(2/3)∗(1/2)∗0=0

Conclusion

The probability of the sentence "<s> I I am not </s>" using the bigram model is 0. This is because the
bigram "I I" and "not </s>" were not observed in the training data, leading to zero probabilities for those
bigrams, and therefore making the entire sentence probability zero.

You might also like