08/09/2025, 15:41 NLP Study Notes for Computer Engineering
NLP Study Notes for Computer Engineering
BE, Semester VII, Computer Engineering
1. What is Natural Language Processing?
Figure: Natural Language Processing Stages
Natural Language Processing (NLP) is a subfield of artificial intelligence and computational
linguistics that focuses on enabling computers to understand, interpret, and generate human
language in a valuable way. It involves the interaction between computers and human
language, particularly how to program computers to process and analyze large amounts of
natural language data.
Key aspects of NLP include:
Language Understanding: Comprehending the meaning behind text or speech
Language Generation: Producing human-like text or speech
Translation: Converting text from one language to another
Sentiment Analysis: Determining the emotional tone of text
NLP combines computational linguistics with machine learning and deep learning models to
process and understand language. These technologies enable computers to process human
language in the form of text or voice data and understand its full meaning, complete with the
speaker or writer's intent and sentiment.
127.0.0.1:5500/nlp.html 1/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
2. Discuss various stages involved in NLP process with
suitable example.
Figure: NLP Process Stages
The NLP process involves several stages that transform raw text into structured information.
These stages work sequentially to extract meaning from natural language:
1. Lexical Analysis: Breaking down text into tokens (words, phrases) and identifying their
parts of speech.
Example: "The cat sits on the mat" → ["The", "cat", "sits", "on", "the", "mat"]
2. Syntactic Analysis: Analyzing the grammatical structure of sentences to understand how
words relate to each other.
Example: Creating a parse tree to show subject-verb-object relationships
3. Semantic Analysis: Extracting the meaning of text by understanding word relationships
and context.
Example: Recognizing that "bank" refers to a financial institution in "I deposited money in
the bank"
4. Discourse Integration: Understanding the relationship between sentences and the overall
context.
Example: Connecting pronouns to their antecedents across sentences
5. Pragmatic Analysis: Interpreting the intended meaning based on context and world
knowledge.
Example: Understanding sarcasm or implied meaning in "Great, another meeting!"
127.0.0.1:5500/nlp.html 2/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
3. Explain the difference between Natural Language and
Computer Language.
Figure: Natural Language vs Computer Language
Natural Language and Computer Language differ significantly in their structure, purpose, and
characteristics:
Feature Natural Language Computer Language
Structure Flexible, ambiguous, evolving Rigid, unambiguous, standardized
Purpose Human communication Instructing computers
Interpretation Context-dependent Literal, exact
Vocabulary Vast, continuously growing Limited, predefined keywords
Error tolerance High (humans can infer meaning) Low (syntax errors cause failure)
Natural languages like English or Spanish have evolved over centuries and contain ambiguities,
idioms, and cultural nuances. Computer languages like Python or Java are designed with
precise syntax and semantics to eliminate ambiguity, ensuring computers can execute
instructions exactly as specified.
4. What do you mean by ambiguity in Natural language?
Explain with suitable example. Discuss various ways to
127.0.0.1:5500/nlp.html 3/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
resolve ambiguity in NL.
Figure: Ambiguity in Natural Language
Ambiguity in Natural Language refers to situations where a word, phrase, or sentence can be
interpreted in multiple ways, leading to uncertainty about the intended meaning. This is a
fundamental challenge in NLP as computers struggle with context-dependent interpretation.
Types of ambiguity with examples:
Lexical Ambiguity: Words with multiple meanings.
Example: "I saw her duck" - Did I see her lower her head or see her pet duck?
Syntactic Ambiguity: Sentences with multiple grammatical structures.
Example: "The chicken is ready to eat" - Is the chicken hungry or ready to be eaten?
Semantic Ambiguity: Phrases with multiple interpretations.
Example: "Visiting relatives can be annoying" - Are the relatives who visit annoying or is
visiting them annoying?
Methods to resolve ambiguity:
1. Context Analysis: Examining surrounding words and broader context
2. Statistical Methods: Using probability models to determine most likely meaning
3. Knowledge Bases: Leveraging domain-specific information
4. Machine Learning: Training models on large annotated datasets
127.0.0.1:5500/nlp.html 4/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
5. Discuss various challenges in processing natural
language.
Figure: Challenges in Natural Language Processing
Natural Language Processing faces several significant challenges that make it a complex field:
Ambiguity Resolution: Words and sentences often have multiple interpretations, requiring
sophisticated context analysis.
Context Understanding: Language meaning heavily depends on context, which can be
cultural, situational, or conversational.
Language Variability: Languages evolve constantly, with new words, slang, and changing
usage patterns.
Resource Intensity: Training effective NLP models requires massive datasets and
computational power.
Cross-lingual Challenges: Different languages have unique structures, making translation
and multilingual processing difficult.
Domain Adaptation: Models trained on general text often perform poorly on specialized
domains like medical or legal texts.
Ethical Concerns: NLP systems can perpetuate biases present in training data, raising
fairness and privacy issues.
These challenges require continuous research and development of more sophisticated
algorithms, larger and more diverse datasets, and better evaluation metrics to create NLP
127.0.0.1:5500/nlp.html 5/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
systems that can truly understand and generate human-like language.
6. What is Morphology? Explain the Morphological
analysis.
Figure: Morphology and Morphological Analysis
Morphology is the study of word formation and structure in linguistics. It examines how
morphemes (the smallest meaningful units of language) combine to create words. In NLP,
morphological analysis is the process of breaking down words into their constituent morphemes
to understand their structure and meaning.
Morphological analysis involves:
1. Segmentation: Breaking words into morphemes.
Example: "unhappiness" → "un" + "happy" + "ness"
2. Classification: Identifying morpheme types (prefixes, suffixes, roots).
Example: "un" (prefix), "happy" (root), "ness" (suffix)
3. Analysis: Determining the grammatical function of each morpheme.
Example: "un" (negation), "ness" (noun formation)
127.0.0.1:5500/nlp.html 6/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
Morphological analysis is crucial for many NLP tasks including:
Stemming and lemmatization
Part-of-speech tagging
Machine translation
Information retrieval
By understanding word structure, NLP systems can better handle inflection, derivation, and
compounding across different languages.
7. Differentiate between Derivational and Inflectional
Morphemes.
Figure: Derivational vs Inflectional Morphemes
Derivational and Inflectional Morphemes are two types of bound morphemes that serve
different functions in word formation:
Feature Derivational Morphemes Inflectional Morphemes
Function Create new words Modify grammatical properties
Effect Changes word meaning/class Preserves word meaning/class
127.0.0.1:5500/nlp.html 7/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
Position Usually prefixes or suffixes Usually suffixes
Productivity Limited application Regularly applied
Examples of Derivational Morphemes:
"un-" in "unhappy" (changes meaning to negative)
"-ness" in "happiness" (changes adjective to noun)
"-ize" in "modernize" (changes noun to verb)
Examples of Inflectional Morphemes:
"-s" in "cats" (plural marker)
"-ed" in "walked" (past tense marker)
"-ing" in "walking" (present participle)
Understanding this distinction is crucial for morphological analysis in NLP systems.
8. Define Stemming and Lemmatization? How do they
work?
Figure: Stemming vs Lemmatization
127.0.0.1:5500/nlp.html 8/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
Stemming and Lemmatization are techniques used in NLP to reduce words to their base
forms, but they differ in approach and accuracy:
Stemming:
A crude heuristic process that chops off word endings
Produces stems that may not be actual words
Fast but less accurate
Example: "studies", "studying", "studied" → "studi"
How Stemming Works:
1. Apply predefined rules to remove suffixes
2. Common algorithms: Porter Stemmer, Snowball Stemmer
3. Does not consider word context or part of speech
Lemmatization:
A sophisticated process using vocabulary and morphological analysis
Produces actual dictionary words (lemmas)
Slower but more accurate
Example: "studies", "studying", "studied" → "study"
How Lemmatization Works:
1. Determine the part of speech of a word
2. Apply morphological analysis to find the root form
3. Use dictionaries or ontologies like WordNet
9. Explain the N-gram model.
127.0.0.1:5500/nlp.html 9/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
Figure: N-gram Model
N-gram Model is a probabilistic language model used in NLP for predicting the next item in a
sequence of text. It's based on the Markov assumption that the probability of a word depends
only on the previous n-1 words.
Types of N-grams:
Unigram (1-gram): Single word
Example: "The", "quick", "brown"
Bigram (2-gram): Two consecutive words
Example: "The quick", "quick brown"
Trigram (3-gram): Three consecutive words
Example: "The quick brown"
127.0.0.1:5500/nlp.html 10/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
How N-gram Models Work:
1. Training: Count frequencies of n-grams in a large corpus
2. Probability Calculation: P(wi|wi-1,wi-2,...,wi-n+1) = Count(wi-n+1,...,wi) / Count(wi-
n+1,...,wi-1)
3. Smoothing: Handle unseen n-grams using techniques like Laplace smoothing
4. Prediction: Select the most probable next word given previous context
N-gram models are used in spelling correction, speech recognition, machine translation, and
text generation. While simple and effective, they have limitations in capturing long-range
dependencies in language.
10. What is POS tagging? What are open and closed
classes in POS tagging?
Figure: POS Tagging Example
POS (Part-of-Speech) Tagging is the process of marking words in a text as corresponding to a
particular part of speech (noun, verb, adjective, etc.) based on both their definition and context.
It's a fundamental task in NLP that helps in understanding sentence structure and meaning.
Open Classes:
Word categories that readily accept new members
Continuously evolving with language
Examples: Nouns, Verbs, Adjectives, Adverbs
Can be further subclassified (e.g., common nouns, proper nouns)
127.0.0.1:5500/nlp.html 11/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
Closed Classes:
Word categories with relatively fixed membership
Resistant to adding new words
Examples: Pronouns, Prepositions, Conjunctions, Determiners, Auxiliary verbs
Function primarily to express grammatical relationships
POS tagging algorithms use:
Rule-based approaches (hand-crafted grammar rules)
Stochastic approaches (Hidden Markov Models, Maximum Entropy)
Transformation-based approaches (Brill Tagger)
Deep learning approaches (Neural Networks, BERT)
Accurate POS tagging is essential for syntactic parsing, information extraction, and many other
NLP applications.
11. What are Hidden Markov Models (HMM)?
Figure: Hidden Markov Model
Hidden Markov Models (HMM) are statistical models used to represent systems that are
assumed to be Markov processes with unobservable (hidden) states. In NLP, HMMs are widely
used for sequence labeling tasks like Part-of-Speech tagging, named entity recognition, and
speech recognition.
Key Components of HMM:
States: The hidden variables we want to infer (e.g., POS tags)
127.0.0.1:5500/nlp.html 12/13
08/09/2025, 15:41 NLP Study Notes for Computer Engineering
Observations: The visible outputs (e.g., words in a sentence)
Transition Probabilities: P(state_j | state_i) - probability of moving from one state to
another
Emission Probabilities: P(observation | state) - probability of an observation given a state
Initial State Probabilities: P(state_i) - probability of starting in each state
Three Fundamental Problems in HMM:
1. Evaluation: Given an HMM and observation sequence, calculate the probability of the
observation sequence (solved using Forward algorithm)
2. Decoding: Given an HMM and observation sequence, find the most likely sequence of
hidden states (solved using Viterbi algorithm)
3. Learning: Given observation sequences, adjust model parameters to best fit the data
(solved using Baum-Welch algorithm)
HMMs are particularly useful in NLP because they can effectively model the sequential nature
of language and handle uncertainty in state assignments.
127.0.0.1:5500/nlp.html 13/13