0% found this document useful (0 votes)

3 views22 pages

NLP Notes Unit2 & Unit3

Uploaded by

dr428440

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views22 pages

NLP Notes Unit2 & Unit3

Uploaded by

dr428440

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Natural Language Processing (NLP) Notes For Unit 2 & Unit 3

1. Morphology
Definition: Morphology is the study of the internal structure of words and how they
are formed. It examines how words are created from smaller meaningful units called
morphemes.

Types of Morphology:

1) Inflectional Morphology:

 Focuses on grammatical changes.

 Does not change the core meaning or the word class.

Examples:

"Walk" -> "Walks" (third person singular).

"Dog" -> "Dogs" (plural).

2) Derivational Morphology:

 Creates new words by adding prefixes or suffixes.

 Often changes the meaning and sometimes the word class.

Examples:

"Happy" -> "Happiness" (adjective to noun).

"Teach" -> "Teacher" (verb to noun).

2. Morphological Analysis and Generation using Finite State

Transducers (FSTs)
Morphological Analysis:

The process of breaking down a word into its root form and identifying its affixes
(prefixes, suffixes, infixes, etc.).

Example:
Input: "unhappiness"

Analysis: Root: "happy", Prefix: "un", Suffix: "ness".

Morphological Generation:

The reverse process of creating a valid word by combining a root with affixes.

Example:

Root: "happy", Prefix: "un", Suffix: "ness" -> Output:

"unhappiness".

Finite State Transducers (FSTs):

A computational model using states and transitions to map input strings to output
strings.

Key Features:

Enables bidirectional processing for analysis and generation.

Efficient for handling regular morphological rules.

Example:

Transition: "walk + ed" -> "walked".

3. Part-of-Speech (POS) Tagging

Part of Speech (POS) Tagging is the process of assigning a part of speech to each
word in a given text based on its definition and context. POS tagging is essential for
natural language understanding and serves as a foundation for many NLP tasks, such
as parsing, information extraction, and sentiment analysis.

Parts of Speech

These are grammatical categories that words are assigned to, such as:

1. Noun (NN): Person, place, thing, or idea (e.g., "dog", "India").

2. Verb (VB): Action or state (e.g., "run", "is").
3. Adjective (JJ): Describes a noun (e.g., "beautiful", "tall").
4. Adverb (RB): Modifies a verb, adjective, or another adverb (e.g., "quickly", "very").
5. Pronoun (PRP): Replaces a noun (e.g., "he", "it").
6. Preposition (IN): Links nouns to other words (e.g., "on", "under").
7. Conjunction (CC): Connects words, phrases, or clauses (e.g., "and", "but").
8. Determiner (DT): Specifies a noun (e.g., "the", "a").
9. Interjection (UH): Expresses emotion (e.g., "wow", "oh").

Approaches to POS Tagging

Rule-Based Tagging:

1. Uses a set of predefined linguistic rules to assign tags.

2. Example Rule: If a word ends in "-ly", tag it as an adverb.
3. Advantages: Transparent and easy to understand.
4. Disadvantages: Requires extensive rule creation and is less robust to variations.

Statistical Tagging:

1. Assigns tags based on probabilities derived from a tagged corpus.

2. Example: Hidden Markov Model (HMM)-based tagging.
1. HMM assumes:
1. Each word depends only on its part of speech.
2. The POS depends on the previous POS tags (Markov assumption).
2. Calculates the tag sequence TT that maximizes the
probability P(T∣ W)P(T∣ W), where WW is the sequence of words.

Machine Learning-Based Tagging:

1. Uses supervised machine learning to train models on labeled datasets.

2. Popular algorithms:
1. Maximum Entropy Models
2. Support Vector Machines (SVMs)
3. Conditional Random Fields (CRFs)
3. These models consider features like:
1. Current word.
2. Surrounding words (context).
3. Word suffixes/prefixes.
4. Capitalization or punctuation.

Neural Network-Based Tagging:

1. Employs deep learning models like Recurrent Neural Networks (RNNs), Long Short-
Term Memory (LSTMs), and Transformers.
2. Automatically learns features from raw text.
3. Example: BERT and other pre-trained language models.
4. Advantages: State-of-the-art accuracy and ability to capture long-range
dependencies.
5. Disadvantages: Requires significant computational resources and large labeled
datasets.

Steps in POS Tagging

Tokenization:

1. Split the text into words or tokens.

2. Example: "The cat sleeps." → ["The", "cat", "sleeps"]

Feature Extraction:

1. Extract features like the word itself, its suffix/prefix, capitalization, etc.

Tag Assignment:

1. Assign tags to tokens using one of the above methods.

Evaluation:

1. Compare the predicted tags to a gold-standard tagged dataset using metrics like
accuracy.

Applications of POS Tagging

1. Syntactic Parsing: Helps in building parse trees for sentences.

2. Information Retrieval: Improves keyword matching by understanding word categories.
3. Named Entity Recognition (NER): Tags words as entities like names, places, or dates.
4. Sentiment Analysis: Identifies adjectives or adverbs for understanding sentiments.
5. Speech Recognition: Assists in understanding sentence structure for transcription.

Example: POS Tagging Sentence

Input Sentence:
"The quick brown fox jumps over the lazy dog."

Tagged Output:

 The/DT
 quick/JJ
 brown/JJ
 fox/NN
 jumps/VBZ
 over/IN
 the/DT
 lazy/JJ
 dog/NN

Tools for POS Tagging

NLTK (Python): Easy-to-use library for POS tagging.

import nltk
nltk.download('averaged_perceptron_tagger')
nltk.pos_tag(["The", "quick", "brown", "fox"])

spaCy: Provides fast and accurate POS tagging.

1. Stanford POS Tagger: Java-based, statistical tagger.

2. Hugging Face Transformers: Neural network-based tagging with pre-trained models.

By understanding and implementing POS tagging, you can enhance text processing
capabilities for a wide variety of applications.

4. Maximum Entropy Model for POS Tagging

A Maximum Entropy (MaxEnt) Model is a probabilistic model that predicts
outcomes (e.g., part-of-speech tags) based on features of the input data. It is widely
used for POS tagging because it can handle overlapping and interdependent features
efficiently without making strong independence assumptions.

Key Concepts in Maximum Entropy

Maximum Entropy Principle:

1. Among all probability distributions consistent with known constraints, the MaxEnt
model chooses the one with the highest entropy (i.e., the least biased distribution).
2. This ensures no additional assumptions are made about the data beyond what is
supported by evidence.

Conditional Probability:

1. For POS tagging, the model predicts the conditional probability of a tag tt given a
word ww and its context CC:P(t∣ C)=1Z(C)exp⁡(∑iλifi(t,C))P(t∣ C)=Z(C)1exp(i∑λifi
(t,C)) where:
1. fi(t,C)fi(t,C): Binary feature functions that indicate relationships
between tt, ww, and CC.
2. λiλi: Weights (parameters) associated with features.
3. Z(C)Z(C): Normalization factor ensuring probabilities sum to
1:Z(C)=∑t′exp⁡(∑iλifi(t′,C))Z(C)=t′∑exp(i∑λifi(t′,C))

Feature Functions:

1. These are manually defined or automatically extracted functions indicating specific

conditions. Examples:
1. Does the word end with "-ing"? f(t,C)=1f(t,C)=1 if t=VBt=VB (verb).
2. Is the previous word a determiner (e.g., "the")?
3. Is the current word capitalized?
Steps in Maximum Entropy POS Tagging

Feature Extraction:

1. Define features based on the current word and its context.

2. Example features:
1. f1f1: The word itself (e.g., "dog").
2. f2f2: Prefixes or suffixes (e.g., "-ing").
3. f3f3: Previous or next words in the sequence.
4. f4f4: Word shape (e.g., capitalization, numbers).

Training the Model:

1. Estimate the weights λiλi for each feature using a tagged training dataset.
2. The weights are learned by maximizing the likelihood of the training data using
optimization algorithms like Iterative Scaling or Gradient Descent.

POS Tagging (Inference):

1. For a given word and its context, compute the conditional probability for each
possible tag.
2. Assign the tag with the highest probability:t^=arg⁡max⁡tP(t∣ C)t^=argtmaxP(t∣ C)

Evaluation:

1. Test the model on a separate dataset and evaluate performance using metrics like
accuracy or F1-score.

Advantages of Maximum Entropy Models

Feature Flexibility

1. Can incorporate arbitrary, overlapping, and non-independent features.

2. Allows the use of rich linguistic information like prefixes, suffixes, and neighboring
words.

No Independence Assumptions:

1. Unlike models like Hidden Markov Models (HMMs), MaxEnt does not assume
conditional independence between features.

Probabilistic Outputs:

1. Provides probabilities for each tag, which can be useful for downstream tasks
requiring confidence scores.
Challenges

Computational Cost:

1. Training can be slow due to the need to calculate the normalization

factor Z(C)Z(C) across all possible tags.

Feature Design

1. The performance depends heavily on the quality of feature engineering.

Sparse Data:

1. Requires sufficient training data to estimate parameters accurately.

Example

Sentence: "The quick brown fox jumps."

Features for the word "quick":

 Current word = "quick".

 Previous word = "The".
 Next word = "brown".
 Word shape = lowercase.
 Suffix = "-ick".

Feature Function Examples:

 f1(t,C)=1f1(t,C)=1 if t=JJt=JJ and Current word = "quick".

 f2(t,C)=1f2(t,C)=1 if t=JJt=JJ and Previous word = "The".

Practical Tools

1. NLTK (Python):

o Can implement MaxEnt for POS tagging.

2. Stanford NLP Library:

o Includes MaxEnt-based POS taggers.

3. scikit-learn:

o Supports logistic regression (a form of MaxEnt) for classification tasks.

Conclusion

The Maximum Entropy model for POS tagging is a powerful method due to its
flexibility in incorporating diverse features and providing robust probabilistic outputs.
While computationally intensive, it remains a popular choice in NLP pipelines.

5. Multi-Word Expressions (MWEs)

Multi-Word Expressions (MWEs) are phrases or word combinations that exhibit
unique properties not entirely predictable from the meanings of their individual
components. They are common in natural language and present challenges for
computational linguistics because their interpretation often requires understanding
them as a single semantic or syntactic unit.

Types of Multi-Word Expressions

Idiomatic Expressions:

1. The meaning of the expression is not literal or compositional.

2. Example: "Kick the bucket" (means "to die").

Collocations:

1. Frequently co-occurring words where the combination is more common than

expected.
2. Example: "Strong tea" (not "powerful tea").

Phrasal Verbs:

1. Verbs combined with particles or prepositions, often with idiomatic meanings.

2. Example: "Give up" (means "to surrender" or "stop trying").

Compound Nouns:

1. Two or more words functioning as a single noun.

2. Example: "Toothbrush", "data science".

Light Verb Constructions:

1. A verb paired with a noun to create a meaning different from the verb alone.
2. Example: "Take a walk" (means "to walk").
Named Entities:

1. Names of people, places, organizations, etc., that act as a single unit.

2. Example: "New York City", "United Nations".

Institutionalized Expressions:

1. Fixed phrases or clichés used in formal or informal contexts.

2. Example: "On the other hand", "at the end of the day".

Challenges with MWEs

Non-Compositionality:

1. The meaning of MWEs cannot always be deduced from their parts (e.g., "spill the
beans").

Ambiguity:

1. MWEs can sometimes be literal or idiomatic depending on context (e.g., "break the
ice" in a conversation vs. literally breaking ice).

Flexibility:

1. Some MWEs allow word reordering or substitution, while others are rigid.
2. Example: "Make a decision" vs. "Decision was made".

Sparsity:

1. MWEs are often rare in corpora, making it difficult for models to learn their
properties.

Language-Specificity:

1. MWEs vary greatly between languages, posing challenges for machine translation.

Approaches to Handling MWEs

Lexicon-Based Methods:

1. Maintain a predefined list of MWEs for identification and tagging.

2. Example: Dictionaries of idioms or named entities.

Statistical Methods:

1. Identify MWEs based on co-occurrence patterns in a corpus.

2. Example Metrics:
1. Pointwise Mutual Information
(PMI):PMI(w1,w2)=log⁡P(w1,w2)P(w1)P(w2)PMI(w1,w2)=logP(w1)P(w2
)P(w1,w2)
2. T-score: Measures the strength of co-occurrence.
3. These methods are useful for collocations like "strong tea".

Parsing-Based Methods:

1. Use syntactic parsers to identify MWEs as single units in parse trees.

Machine Learning Models:

1. Train models to recognize MWEs using annotated data.

2. Features include:
1. Word context.
2. POS tags of components.
3. Dependency relations.

Neural Network Approaches:

1. Use embeddings and sequence models (LSTMs, Transformers) to detect MWEs.

2. Pre-trained models like BERT often learn representations for MWEs during training.

Hybrid Approaches:

1. Combine lexicons, statistical measures, and machine learning for better

performance.

Applications of MWE Recognition

Machine Translation:

1. Ensure idiomatic expressions are translated correctly (e.g., "spill the beans" →
"reveal the secret").

Information Retrieval:

1. Improve search accuracy by treating MWEs as single units (e.g., "data science").

Text Summarization:

1. Capture meaningful phrases for concise summaries.

Sentiment Analysis:

1. Handle expressions like "not bad" (positive sentiment despite the negative word
"not").
Speech Recognition:

1. Recognize MWEs as a single concept to improve transcription quality.

Example: Handling MWEs

Input Sentence: "John kicked the bucket last night."

MWE Recognition:

 Identify "kicked the bucket" as a single idiomatic expression.

 Assign the meaning "died" instead of interpreting "kicked" and "bucket" literally.

Conclusion

Multi-Word Expressions are an essential aspect of natural language that adds richness
and complexity. Effectively recognizing and processing MWEs is critical for
applications like machine translation, sentiment analysis, and conversational AI.
Combining linguistic, statistical, and neural approaches can help address the
challenges they present.

6. Role of Language Models

Language Models (LMs) are essential in the field of Natural Language Processing
(NLP) because they help machines understand, process, and generate human language
in a way that mimics how humans use language. Their core function is to predict the
probability of a sequence of words or tokens, enabling a variety of applications that
require comprehension, generation, or transformation of text.

Here are the key roles of language models:

1. Word Prediction and Text Generation

Language models are primarily used for predicting the next word in a sequence,
which is a fundamental task in many NLP applications, such as text completion and
auto-correction. They are also used in text generation, where the model generates
coherent and contextually relevant sentences based on a given prompt.

Example:

o Given the input "The sun is", a language model might predict the next word as
"shining", "bright", or "setting" based on the context.

Applications:
o Autocompletion (in search engines, email, or messaging apps).
o Creative Text Generation (writing stories, generating poetry, etc.).

2. Language Understanding

Language models also facilitate language understanding by enabling machines to

interpret text or speech accurately. They can recognize the syntax, semantics, and
context in sentences to solve tasks such as:

Sentiment Analysis: Determining if a sentence is positive, negative, or

neutral.

Named Entity Recognition (NER): Identifying people, organizations,

locations, and other entities in a text.

Part-of-Speech Tagging: Assigning grammatical labels (e.g., noun, verb,

adjective) to each word in a sentence.

Example:

o In the sentence "Apple is looking to buy a startup", an LM helps identify that

"Apple" is a company, not the fruit.

Applications:

o Sentiment analysis in social media, customer reviews.

o Text categorization (e.g., news articles or product classifications).

3. Text Classification

Language models are used in text classification tasks, where the goal is to categorize
text into predefined categories. This includes applications like spam detection, topic
categorization, and intent recognition.

Example:

o A language model can classify a tweet into categories like "politics", "technology",
or "entertainment".

Applications:

o Email spam filters.

o Sentiment analysis in customer feedback.

4. Machine Translation
Language models are central to machine translation, where the goal is to translate
text from one language to another while maintaining meaning and fluency. These
models help ensure that translations are accurate, contextually appropriate, and
grammatically correct.

Example:

o Translate "How are you?" from English to French as "Comment ça va?" using a
language model trained on bilingual data.

Applications:

o Google Translate, DeepL, and other translation tools.

5. Question Answering

Language models also power question answering systems, where they can extract or
generate answers to user queries from a given context, such as a passage of text or a
database.

Example:

o Given the input "What is the capital of France?", a language model can provide the
output "Paris".

Applications:

o Virtual assistants (e.g., Siri, Alexa).

o Search engines (Google, Bing) providing direct answers to queries.

6. Conversational Agents (Chatbots)

Language models are used to build chatbots and virtual assistants that engage in
human-like conversation. These systems process user input, generate meaningful and
contextually appropriate responses, and can carry on multi-turn dialogues.

Example:

o A customer service chatbot powered by a language model can understand and

respond to queries like "Where is my order?" or "How can I return an item?"

Applications:

o Customer support bots.

o Personal assistants (e.g., Google Assistant, Apple Siri).

7. Text Summarization
Language models also assist in text summarization, where the goal is to generate a
concise and coherent summary of a longer document while preserving the main ideas.
There are two types of summarization:

Extractive Summarization: The model selects key sentences directly from

the text.

Abstractive Summarization: The model generates new sentences that

paraphrase the original content.

Example:

o A model can summarize a lengthy article about climate change into a few key points,
such as "Climate change is causing sea levels to rise and extreme weather events to
increase."

Applications:

o News article summarization.

o Legal or academic document summarization.

8. Speech Recognition

Language models play a crucial role in speech recognition, which converts spoken
language into text. The LM helps disambiguate similar-sounding words and corrects
possible errors based on context.

Example:

o In speech-to-text systems, a model might correct "I scream" to "ice cream" based on
context.

Applications:

o Virtual assistants like Google Assistant or Apple Siri.

o Voice typing and transcription services.

9. Information Retrieval and Search

Language models help in information retrieval, where they can improve search
engines by ranking documents based on their relevance to a query. The model
understands the query context and retrieves documents that best match the intent of
the user

Example:

o A search engine query like "Best Italian restaurants in New York" will return a list of
relevant restaurants based on the query’s semantics
Applications:

o Web search engines (Google, Bing).

o Document or database retrieval systems.

10. Text-to-Speech (TTS) Systems

Language models can also be used in text-to-speech systems, where they help
generate natural-sounding speech from text. This process involves understanding the
linguistic structures and converting them into phonetic transcriptions, followed by
speech synthesis.

Example:

o A language model helps convert the sentence "Hello, how are you?" into fluent and
natural speech.

Applications:

o Assistive technologies for the visually impaired.

o Voice-based assistants.

Challenges in Language Models

Bias and Fairness:

o Language models may reflect biases present in their training data, leading to biased
outputs. Addressing this is a major challenge in developing responsible AI systems.

Context Understanding:

o While language models can understand context, they still struggle with long-term
dependencies or highly ambiguous situations.

Data and Resource Intensity:

Training state-of-the-art language models requires massive amounts of data and

computational resources, making them expensive and inaccessible to smaller
organizations.

Ethical Concerns:

o Language models may be misused for generating misleading information (e.g., fake
news) or malicious activities (e.g., phishing).

Conclusion
Language models serve as the backbone for many modern AI systems, enabling
machines to process, understand, and generate human language in a wide range of
applications, from chatbots and virtual assistants to machine translation and text
summarization. As NLP and AI technology evolve, the role of language models will
continue to expand, making interactions with machines more natural and intuitive.

7. Simple N-Gram Models

An N-gram model is a probabilistic language model used to predict the next word in
a sequence based on the previous N−1N−1 words. It is one of the simplest methods
for modeling sequences of text or speech.

N-Gram Definition:

1. An N-gram is a contiguous sequence of NN items (words, characters, etc.) in text or

speech. For example:
1. Unigram (N=1N=1): "I", "love", "coding"
2. Bigram (N=2N=2): "I love", "love coding"
3. Trigram (N=3N=3): "I love coding"

Conditional Probability:

1. The probability of a word wiwi given the previous N−1N−1 words is calculated
as:P(wi∣ wi−N+1,…,wi−1)P(wi∣ wi−N+1,…,wi−1)

Markov Assumption:

1. The N-gram model simplifies language modeling by assuming that the probability of
a word depends only on the
previous N−1N−1 words:P(w1,w2,…,wT)≈∏i=1TP(wi∣ wi−N+1,…,wi−1)P(w1,w2,…,wT
)≈i=1∏TP(wi∣ wi−N+1,…,wi−1) For example, in a bigram model
(N=2N=2):P(w1,w2,…,wT)≈∏i=1TP(wi∣ wi−1)P(w1,w2,…,wT)≈i=1∏TP(wi∣ wi−1)

Training N-Gram Models:

1. The probabilities are estimated from a corpus by counting the occurrences of word
sequences:P(wi∣ wi−N+1,…,wi−1)=Count(wi−N+1,…,wi)Count(wi−N+1,…,wi−1)P(wi
∣ wi−N+1,…,wi−1)=Count(wi−N+1,…,wi−1)Count(wi−N+1,…,wi)

Advantages

1. Simplicity:

Easy to implement and understand.

2. Efficiency:

Computation is straightforward, especially for smaller NN.

3. Interpretability:
Counts and probabilities can be easily understood.

Challenges

Data Sparsity:

1. Many NN-grams in the corpus may not appear in the training data, leading to zero
probabilities for unseen sequences.

Scalability:

1. For large NN, the number of possible NN-grams grows exponentially, requiring large
amounts of data.

Short Context:

1. NN-grams rely on fixed-length context, limiting the ability to capture long-term

dependencies in text.

Smoothing in N-Gram Models

To address the issue of zero probabilities, smoothing techniques are applied:

1. Laplace Smoothing: Adds a small constant (e.g., 1) to all counts.

2. Good-Turing Smoothing: Adjusts counts based on the frequency of frequencies.
3. Kneser-Ney Smoothing: Redistributes probabilities more effectively by considering lower-
order models.

Applications

 Predictive Text: Suggesting the next word in typing.

 Speech Recognition: Transcribing spoken language.
 Machine Translation: Translating sequences of text.
 Spam Filtering: Analyzing email content.

Example: Bigram Model

Training Corpus:

css
Copy code
I love coding. I love AI.
Bigram Probabilities:

 P("love"∣ "I")=Count("I love")Count("I")=22=1P("love"∣ "I")=Count("I")Count("I love")=22=1

 P("coding"∣ "love")=Count("love coding")Count("love")=12P("coding"∣ "love")=Count("love")
Count("love coding")=21

Sentence Probability:

For the sentence "I love coding":

P("I love coding")=P("I")⋅P("love"∣"I")⋅P("coding"∣"love")P("I lo

ve coding")=P("I")⋅P("love"∣"I")⋅P("coding"∣"love")

If P("I")P("I") is assumed uniform:

P("I")=13, P("love"∣"I")=1, P("coding"∣"love")=0.5P("I")=31

,P("love"∣"I")=1,P("coding"∣"love")=0.5P("I love coding")=13⋅1⋅0.5
=0.1667P("I love coding")=31⋅1⋅0.5=0.1667

This simple model can predict text probabilities, detect patterns, or suggest
completions.

8. Estimating Parameters and Smoothing

Estimating Parameters

In the context of probabilistic models (e.g., in Natural Language Processing or

Machine Learning), parameter estimation refers to determining the values of
parameters that best represent the data. These parameters define the model's behavior,
such as probabilities in probabilistic models.

Key Methods for Parameter Estimation

Maximum Likelihood Estimation (MLE):

1. The goal is to find the parameter values that maximize the likelihood of the
observed data.
2. For a dataset DD, the likelihood L(θ∣ D)L(θ∣ D) for parameter θθ is calculated
as:L(θ∣ D)=∏i=1NP(xi∣ θ)L(θ∣ D)=i=1∏NP(xi∣ θ) where xixi are data points.
3. Log-likelihood is often used for easier
computation:ℓ(θ∣ D)=∑i=1Nlog⁡P(xi∣ θ)ℓ(θ∣ D)=i=1∑NlogP(xi∣ θ)

Bayesian Estimation (Maximum A Posteriori - MAP):

1. Incorporates prior beliefs about parameters using Bayes'

Theorem:P(θ∣ D)=P(D∣ θ)P(θ)P(D)P(θ∣ D)=P(D)P(D∣ θ)P(θ)
2. The objective is to maximize P(θ∣ D)P(θ∣ D), combining prior
knowledge P(θ)P(θ) with observed data likelihood P(D∣ θ)P(D∣ θ).
Method of Moments:

1. Matches moments (e.g., mean, variance) of the distribution to those of the data for
parameter estimation.

Smoothing

Smoothing addresses the issue of assigning non-zero probabilities to unseen events in

probabilistic models, particularly language models. Without smoothing, unseen events
have a probability of zero, which can disrupt calculations and lead to poor
generalization.

Common Smoothing Techniques

Laplace Smoothing (Add-One Smoothing):

1. Adds 1 to each count to ensure no probability is zero.

2. Adjusted probability:P(wi∣ wi−1)=C(wi−1,wi)+1C(wi−1)+VP(wi∣ wi−1)=C(wi−1
)+VC(wi−1,wi)+1 where C(wi−1,wi)C(wi−1,wi) is the bigram count, C(wi−1)C(wi−1)is
the unigram count, and VV is the vocabulary size.

Additive Smoothing (Generalization of Laplace):

1. Adds a constant α>0α>0 (instead of 1) to each

count:P(wi∣ wi−1)=C(wi−1,wi)+αC(wi−1)+αVP(wi∣ wi−1)=C(wi−1)+αVC(wi−1,wi)+α

Good-Turing Smoothing:

1. Adjusts probabilities based on the frequency of frequencies.

2. Rare events’ probabilities are redistributed to unseen events.
3. Adjusted count:C∗=(r+1)Nr+1NrC∗=Nr(r+1)Nr+1 where rr is the observed
frequency, NrNr is the number of events with frequency rr.

Kneser-Ney Smoothing:

1. A more advanced smoothing method for language models.

2. Combines a back-off mechanism with adjusted probabilities, ensuring better
handling of rare and unseen n-grams.

Jelinek-Mercer Smoothing:

1. A linear interpolation approach:P(wi∣ wi−1)=λPMLE(wi∣ wi−1)+(1−λ)PMLE(wi)P(wi

∣ wi−1)=λPMLE(wi∣ wi−1)+(1−λ)PMLE(wi) where λλ is a weighting parameter.

Why Estimation and Smoothing Are Crucial

1. Parameter Estimation determines the structure of the model, ensuring it aligns with
observed data.
2. Smoothing prevents overfitting to seen data and enhances generalization to unseen data,
critical for robust probabilistic models like language models.

9. Evaluating Language Models

Evaluating a language model involves assessing its ability to perform various
language-related tasks effectively and accurately. This evaluation can be performed
using different methodologies, depending on the intended purpose of the model.
Below is a breakdown of common aspects of language model evaluation:

1. Intrinsic Evaluation

Measures the quality of the model's linguistic capabilities directly, often on pre-
defined tasks.

 Perplexity: Evaluates how well a model predicts a sample. Lower perplexity indicates better
prediction accuracy.
 BLEU/ROUGE/METEOR Scores: Measures how closely the model-generated text matches
reference text. Commonly used in machine translation and summarization tasks.
 Grammaticality: Evaluates the grammatical correctness of generated sentences.
 Language Understanding: Tasks like word similarity, analogy completion, and syntactic
parsing.

2. Extrinsic Evaluation

Focuses on how well the model performs in downstream tasks:

 Text Classification: E.g., sentiment analysis, spam detection.

 Named Entity Recognition (NER): Identifying entities like names, places, or dates.
 Machine Translation: Assessing translation quality across languages.
 Question Answering: Evaluating accuracy and relevance in responses to queries.

3. Human Evaluation

Uses human judgment to assess aspects not easily measured by automated metrics:

 Fluency: How natural the generated text sounds.

 Relevance: Appropriateness of the model's output to a prompt.
 Creativity: Ability to produce novel and meaningful text.
4. Ethical and Fairness Considerations

Evaluates the model's social impact and fairness:

 Bias and Fairness: Testing for discriminatory or biased outputs.

 Toxicity: Ensuring the model doesn't generate harmful or offensive text.
 Inclusivity: Checking for language inclusiveness and diversity.

5. Robustness and Generalization

Tests how well the model performs under challenging conditions:

 Adversarial Testing: Inputs designed to trick or confuse the model.

 Domain Generalization: Evaluating the model on data from domains it wasn’t trained on.

6. Scalability and Efficiency

Measures practical usability and resource constraints:

 Latency: Time taken to generate responses.

 Memory Usage: Computational and memory efficiency.
 Energy Consumption: Evaluating the carbon footprint of running the model.

Example Evaluation Metrics for Popular Tasks

Task Metric Explanation

Machine Matches n-grams in the generated and
BLEU
Translation reference translations.
Measures overlap of words/phrases
Summarization ROUGE
with reference summaries.
Language Predictive quality of the model's
Perplexity
Modeling probabilities.
Human
Text Generation Judges fluency and relevance.
Evaluation
Sentiment Accuracy/F1-
Measures classification correctness.
Analysis Score

Frameworks and Tools for Evaluation

 GLUE (General Language Understanding Evaluation): A benchmark suite for NLP tasks.
 SuperGLUE: More challenging tasks for advanced models.
 HumanEval: Human-based tasks for assessing language generation.
 LAMBADA: Tests understanding of long-range dependencies.

Evaluation of a language model depends heavily on its intended application,

balancing metrics, human feedback, and ethical concerns for comprehensive
assessment.

Unit-3.Word Level Analysis AIML
No ratings yet
Unit-3.Word Level Analysis AIML
5 pages
POS Tagging: Techniques and Challenges
No ratings yet
POS Tagging: Techniques and Challenges
75 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
Speech Recognition Systems Guide
No ratings yet
Speech Recognition Systems Guide
13 pages
NLP Unit-Ii - Mma
No ratings yet
NLP Unit-Ii - Mma
19 pages
Tagging and Its Types
No ratings yet
Tagging and Its Types
3 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
Ai TXT Unit4
No ratings yet
Ai TXT Unit4
39 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
Module-2 NLP
No ratings yet
Module-2 NLP
50 pages
Unit 2 Pos Tagger
No ratings yet
Unit 2 Pos Tagger
9 pages
3.1 Chap NLP Pos - Tagging - Lecture3
No ratings yet
3.1 Chap NLP Pos - Tagging - Lecture3
38 pages
4 Pos
No ratings yet
4 Pos
62 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
Unit 3
No ratings yet
Unit 3
16 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
7 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
3 Natural Language Processing-PoS Tagging
No ratings yet
3 Natural Language Processing-PoS Tagging
14 pages
NLP 4
No ratings yet
NLP 4
83 pages
CH-2 Natural Language Processing Models and Algorithm
No ratings yet
CH-2 Natural Language Processing Models and Algorithm
119 pages
NLP 9 Que
No ratings yet
NLP 9 Que
10 pages
POStagging
No ratings yet
POStagging
72 pages
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
No ratings yet
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
69 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
NLP Session 6
No ratings yet
NLP Session 6
5 pages
POS Tagging Comparison
No ratings yet
POS Tagging Comparison
3 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
Module 3 NLP
No ratings yet
Module 3 NLP
97 pages
2.1 Rule Based POS Tagging
No ratings yet
2.1 Rule Based POS Tagging
5 pages
L11-POS - Tagging - II
No ratings yet
L11-POS - Tagging - II
43 pages
Unit No 3
No ratings yet
Unit No 3
8 pages
POS Tagging: Name: E Gayathri REG NO: 21MIS0241
No ratings yet
POS Tagging: Name: E Gayathri REG NO: 21MIS0241
18 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Parts of Speech Tagging Using Hidden Markov Model, Maximum Entropy Model and Conditional Random Field
No ratings yet
Parts of Speech Tagging Using Hidden Markov Model, Maximum Entropy Model and Conditional Random Field
28 pages
POS Tagging and NER Methods
No ratings yet
POS Tagging and NER Methods
51 pages
What Is POS Tagging in NLP
No ratings yet
What Is POS Tagging in NLP
8 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
36 pages
PARTS OF SPEECH TAGGING Article
No ratings yet
PARTS OF SPEECH TAGGING Article
4 pages
Rule Based POS Tagging Example
No ratings yet
Rule Based POS Tagging Example
4 pages
NLP Lab 2
No ratings yet
NLP Lab 2
6 pages
II ND Unit NLP
No ratings yet
II ND Unit NLP
21 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
POS Tagging for NLP Enthusiasts
No ratings yet
POS Tagging for NLP Enthusiasts
47 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
Part of Speech Tagging
No ratings yet
Part of Speech Tagging
13 pages
NLP Unit III Notes
No ratings yet
NLP Unit III Notes
30 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Module 2 HMMPPT
No ratings yet
Module 2 HMMPPT
31 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
POS Tagging and HMM in NLP
No ratings yet
POS Tagging and HMM in NLP
93 pages
NLP Ia2
No ratings yet
NLP Ia2
18 pages
Document Summarization: Abhirut Gupta Mandar Joshi Piyush Dungarwal
No ratings yet
Document Summarization: Abhirut Gupta Mandar Joshi Piyush Dungarwal
47 pages
Clinical Text Summarizationusing NLPPretrainedlanaguage Models
No ratings yet
Clinical Text Summarizationusing NLPPretrainedlanaguage Models
16 pages
NLP Mini Project
No ratings yet
NLP Mini Project
19 pages
Index and Abstract With Keywords
No ratings yet
Index and Abstract With Keywords
2 pages
Automatic Extractive Text Summarization For Nepali Language With Bidirectional Encorder Representation Transformer and K Mean Clustering1
No ratings yet
Automatic Extractive Text Summarization For Nepali Language With Bidirectional Encorder Representation Transformer and K Mean Clustering1
16 pages
Text Summarization From Legal Documents A Survey
No ratings yet
Text Summarization From Legal Documents A Survey
38 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
Natural Language Processing Important Questions Answers
100% (1)
Natural Language Processing Important Questions Answers
31 pages
How We Built Text-to-SQL at Pinterest - by Pinterest Engineering - Pinterest Engineering Blog - Medium
No ratings yet
How We Built Text-to-SQL at Pinterest - by Pinterest Engineering - Pinterest Engineering Blog - Medium
9 pages
A Detailed Study On Text Mining Techniques
No ratings yet
A Detailed Study On Text Mining Techniques
4 pages
AI Problem Solving Examples
No ratings yet
AI Problem Solving Examples
34 pages
NLP Notes
No ratings yet
NLP Notes
6 pages
DemokritosGR Proceedings
No ratings yet
DemokritosGR Proceedings
10 pages
Telugu 3
No ratings yet
Telugu 3
6 pages
Computational Lexicons and Dictionaries
No ratings yet
Computational Lexicons and Dictionaries
14 pages
Meeting Insights Summarisation Using Speech Recognition
No ratings yet
Meeting Insights Summarisation Using Speech Recognition
8 pages
Summer Internship Report on AGI Systems and NLP
No ratings yet
Summer Internship Report on AGI Systems and NLP
26 pages
Unit1 Social Media Analytics
No ratings yet
Unit1 Social Media Analytics
94 pages
NLP Text Summarization Techniques
No ratings yet
NLP Text Summarization Techniques
21 pages
Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization
No ratings yet
Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization
27 pages
Thu 2020
100% (1)
Thu 2020
6 pages
Ms Word Ai
No ratings yet
Ms Word Ai
1 page
Summarization From Medical Documents A Survey
No ratings yet
Summarization From Medical Documents A Survey
22 pages
Annexure-1 - Format
No ratings yet
Annexure-1 - Format
14 pages
NLP Unit I Notes
No ratings yet
NLP Unit I Notes
29 pages
Grade 5 English Lesson Plan: Summarizing Texts
No ratings yet
Grade 5 English Lesson Plan: Summarizing Texts
11 pages
A Discriminative Convolutional Neural Network With Context-Aware Attention
No ratings yet
A Discriminative Convolutional Neural Network With Context-Aware Attention
21 pages
YouTube Video Summarizer in Regional Language
No ratings yet
YouTube Video Summarizer in Regional Language
5 pages
Project Abstract Youtube Transcript Summarizer
No ratings yet
Project Abstract Youtube Transcript Summarizer
2 pages
Bigdata
No ratings yet
Bigdata
9 pages