0% found this document useful (0 votes)

11 views32 pages

Lecture 4 Language Models Updated

The document discusses N-gram language models and their application in natural language processing tasks, emphasizing the use of statistical techniques and probability theory. It explains concepts such as unconditional and conditional probabilities, Bayes' theorem, and the chain rule of probability, which are foundational for understanding language models. Additionally, it covers practical examples of calculating probabilities, including the use of Laplace smoothing to address zero probabilities for unseen events.

Uploaded by

yousefalarbeed031

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views32 pages

Lecture 4 Language Models Updated

Uploaded by

yousefalarbeed031

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

N-gram Language Models

Dr. Wafa Zaal Almaaitah

Statistical Language Processing
• In the solution of many problems in the natural language processing, statistical
language processing techniques can be also used.
– optical character recognition
– spelling correction
– speech recognition
– machine translation
– part of speech tagging
– parsing
• Statistical techniques can be used to disambiguate the input.
• They can be used to select the most probable solution.
• Statistical techniques depend on the probability theory.
• To able to use statistical techniques, we will need corpora to collect statistics.
• Corpora should be big enough to capture the required knowledge.

Dr. Wafa Zaal Almaaitah

Basic Probability
• Probability Theory: predicting how likely it is that something will happen.
• Probabilities: numbers between 0 and 1.
• Probability Function:
– P(A) means that how likely the event A happens.
– P(A) is a number between 0 and 1
– P(A)=1 => a certain event
– P(A)=0 => an impossible event
• Example: a coin is tossed three times. What is the probability of 3 heads?
– 1/8
– uniform distribution

Dr. Wafa Zaal Almaaitah

Basic Probability
Since each coin toss is independent of the others, we can multiply the probabilities
together:

Probability of getting heads on the 1st toss = 0.5

Probability of getting heads on the 2nd toss = 0.5
Probability of getting heads on the 3rd toss = 0.5

To find the probability of all three events happening (getting heads on all three
tosses), we multiply the individual probabilities:

0.5×0.5×0.5=0.125

So, the probability of getting 3 heads when a coin is tossed three times is 0.125 or
12.5%.

Dr. Wafa Zaal Almaaitah

Probability Spaces
• There is a sample space and the subsets of this sample space describe the events.
•  is a sample space.
–  is the certain event
– the empty set is the impossible event.

P(A) is between 0 and 1

P() = 1

There is a capital Omega (Ω) symbol, which typically represents the sample space in probability
theory or the universal set in set theory. Inside Ω, there's a circle with the capital letter A inside it,
representing a subset of the sample space or a particular set within the universal set.
Dr. Wafa Zaal Almaaitah
Unconditional and Conditional Probability
• Unconditional Probability or Prior Probability
– P(A)
– the probability of the event A does not depend on other events.

• Conditional Probability -- Posterior Probability -- Likelihood

– P(A|B)
– this is read as the probability of A given that we know B.

• Example:
– P(put) is the probability of to see the word put in a text
– P(on|put) is the probability of to see the word on after seeing the word put.

Dr. Wafa Zaal Almaaitah

Unconditional and Conditional Probability

P(A|B) = P(AB) / P(B)

P(B|A) = P(AB) / P(A)

Dr. Wafa Zaal Almaaitah

Bayes’ Theorem
• Bayes’ theorem is used to calculate P(A|B) from given P(B|A).
• We know that:
P(AB) = P(A|B) P(B)
P(AB) = P(B|A) P(A)

• So, we will have:

P(B | A)P( A) P( A | B)P(B)

P( A | B) = P(B | A) =
P(B) P( A)

Dr. Wafa Zaal Almaaitah

Language Model
• Models that assign probabilities to sequences of words are called language models
(LMs).
• The simplest language model that assigns probabilities to sentences and sequences of
words is the n-gram.
• An n-gram is a sequence of N words:
– A 1-gram (unigram) is a single word sequence of words like “please” or “ turn”.
– A 2-gram (bigram) is a two-word sequence of words like “please turn”, “turn your”, or
”your homework”.
– A 3-gram (trigram) is a three-word sequence of words like “please turn your”, or “turn your
homework”.
• We can use n-gram models to estimate the probability of the last word of an n-gram
given the previous words, and also to assign probabilities to entire word sequences.

Dr. Wafa Zaal Almaaitah

Probabilistic Language Models
• Probabilistic language models can be used to assign a probability to a sentence in
many NLP tasks.

• Machine Translation:
– P(high winds tonight) > P(large winds tonight)

• Spell Correction:
– Thek office is about ten minutes from here
– P(The Office is) > P(Then office is)

• Speech Recognition:
– P(I saw a van) >> P(eyes awe of an)

• Summarization, question-answering, …

Dr. Wafa Zaal Almaaitah

Probabilistic Language Models
• Our goal is to compute the probability of a sentence or sequence of words W
(=w1,w2,…wn):
– P(W) = P(w1,w2,w3,w4,w5…wn)

• What is the probability of an upcoming word?:

– P(w5|w1,w2,w3,w4)

• A model that computes either of these:

– P(W) or P(wn|w1,w2…wn-1) is called a language model.

Dr. Wafa Zaal Almaaitah

Chain Rule of Probability
• How can we compute probabilities of entire word sequences like w1,w2,…wn?
– The probability of the word sequence w1,w2,…wn is P(w1,w2,…wn).
• We can use the chain rule of the probability to decompose this probability:
P(w1n) = P(w1) P(w2|w1) P(w3|w12) … P(wn|w1n-1)
n
=  k 1 )
P(w
k =1
| w k −1

Example:
P(the man from jupiter) =
P(the) P(man|the) P(from|the man) P(jupiter|the man from)

Dr. Wafa Zaal Almaaitah

Chain Rule of Probability
and Conditional Probabilities
• The chain rule shows the link between computing the joint probability of a sequence
and computing the conditional probability of a word given previous words.

• Definition of Conditional Probabilities:

P(B|A) = P(A,B) / P(A) ➔ P(A,B) = P(A) P(B|A)

• Conditional Probabilities with More Variables:

P(A,B,C,D) = P(A) P(B|A) P(C|A,B) P(D|A,B,C)

• Chain Rule:
P(w1… wn) = P(w1) P(w2|w1) P(w3|w1w2) … P(wn|w1…wn-1)

Dr. Wafa Zaal Almaaitah

Computing Conditional Probabilities
• To compute the exact probability of a word given a long sequence of preceding words
is difficult (sometimes impossible).
• We are trying to compute P(wn|w1…wn-1) which is the probability of seeing wn after
seeing w1n-1.
• We may try to compute P(wn|w1…wn-1) exactly as follows:

P(wn|w1…wn-1) = count(w1…wn-1wn) / count(w1…wn-1)

• Too many possible sentences and we may never see enough data for estimating these
probability values.
• So, we need to compute P(wn|w1…wn-1) approximately.

Dr. Wafa Zaal Almaaitah

N-Grams
• The intuition of the n-gram model (simplifying assumption):
– instead of computing the probability of a word given its entire history, we can
approximate the history by just the last few words.

P(wn|w1…wn-1) ≈ P(wn) unigram

• In general, N-Gram is
𝐧−𝟏 )
P(wn|w1…wn-1) ≈ 𝐏(wn|𝐰𝐧−𝐍+𝟏

Dr. Wafa Zaal Almaaitah

N-Grams
computing probabilities of word sequences

Unigrams -- P(w1n)   P(w )

k =1
k

Bigrams -- P(w1n)   P(w

k =1
k | wk −1 )

n
Trigrams -- P(w1n)   P(w
k =1
k | wk −1 wk −2 )

4-grams -- P(w1n)   P(w

k =1
k | wk −1 wk −2 wk −3 )

Dr. Wafa Zaal Almaaitah

N-Grams
computing probabilities of word sequences (Sentences)
Unigram
P(<s> the man from jupiter came </s>) 
P(the) P(man) P(from) P(jupiter) P(came)

Dr. Wafa Zaal Almaaitah

Example: using cosine similarity

Corpus
•Document 1: "Python is a popular programming language."
•Document 2: "Java and Python are both programming languages.“

Query
•Query: "Python programming language"

Dr. Wafa Zaal Almaaitah

Steps
1. Tokenization: Tokenize the documents and the query into words.
•Document 1: ["Python", "is", "a", "popular", "programming", "language"]

•Document 2: ["Java", "and", "Python", "are", "both", "programming", "languages"]

•Query: ["Python", "programming", "language"]

2. Generate Bigrams:
•Document 1: [("Python", "is"), ("is", "a"), ("a", "popular"), ("popular", "programming"),
("programming", "language")]
•Document 2: [("Java", "and"), ("and", "Python"), ("Python", "are"), ("are", "both"), ("both",
"programming"), ("programming", "languages")]
•Query: [("Python", "programming"), ("programming", "language")]

3. Vector Representation: Represent each document and the query as a vector of bigram
frequencies.
• Document 1: [1, 1, 1, 1, 1, 0] (where the elements correspond to the presence of each
bigram)
• Document 2: [0, 0, 0, 0, 1, 1]
• Query: [1, 1, 0, 0, 0, 0]
Dr. Wafa Zaal Almaaitah
Steps
4. Cosine Similarity Calculation: Calculate the cosine similarity between the query
vector and each document vector.

Document 1 has a higher cosine similarity (0.89) with the query compared to Document 2
(0), indicating that Document 1 is more relevant to the query.

Dr. Wafa Zaal Almaaitah

Example: using conditional probabilities

Corpus
•Document 1: "Python is a popular programming language."
•Document 2: "Java and Python are both programming languages.“

Query
•Query: "Python programming language"

Dr. Wafa Zaal Almaaitah

Steps

1. Tokenization: Tokenize the documents and the query into words.

•Document 1: ["Python", "is", "a", "popular", "programming", "language"]

•Document 2: ["Java", "and", "Python", "are", "both", "programming", "languages"]

•Query: ["Python", "programming", "language"]

Dr. Wafa Zaal Almaaitah

Steps

3. Calculate Conditional Probabilities:

•For each document, calculate the conditional probability of observing each bigram in
the query given the previous bigram in the document.

Conditional probability formula:

Dr. Wafa Zaal Almaaitah

Steps
Calculation for Document 1:

•P("Python" | <start>) = 1 / 1 = 1
•P("programming" | "Python") = 1 / 1 = 1
•P("language" | "programming") = 1 / 1 = 1
•Overall Probability:

Calculation for Document 2:

1. Perform similar calculations as for Document 1.

4. Ranking:
1. Calculate the overall probability for each document.
2. Rank the documents based on these probabilities, with higher probabilities indicating
higher relevance to the query.
Dr. Wafa Zaal Almaaitah
Laplace smoothing

Laplace smoothing, also known as add-one smoothing, is a technique used to handle

the issue of zero probabilities for unseen events in probabilistic models, such as
language models.

Dr. Wafa Zaal Almaaitah

Laplace smoothing

By adding 1 to the numerator and V to the denominator, Laplace smoothing ensures

that even unseen N-grams have a non-zero probability.

This helps in making the model more robust and prevents it from assigning zero
probabilities to unseen events, which could lead to overly sparse or inaccurate
probability estimates.

Dr. Wafa Zaal Almaaitah

Calculation with Laplace Smoothing

Dr. Wafa Zaal Almaaitah

Estimating N-Gram Probabilities
A Bigram Example
• A mini-corpus: We augment each sentence with a special symbol <s> at the beginning of the
sentence, to give us the bigram context of the first word, and special end-symbol </s>.
<s> I am Sam </s>
<s> Sam I am </s>
<s> I fly </s>
• Unique words: I, am, Sam, fly
• Bigrams: <s> and </s> are also tokens. There are 6(4+2) tokens and 6*6=36 bigrams
P(I|<s>)=2/3 P(Sam|<s>)=1/3 P(am|<s>)=0 P(fly|<s>)=0 P(<s>|<s>)=0 P(</s>|<s>)=0

P(I|I)=0 P(Sam|I)=0 P(am|I)=2/3 P(fly|I)=1/3 P(<s>|I)=0 P(</s>|I)=0

P(I|am)=0 P(Sam|am)=1/2 P(am|am)=0 P(fly|am)=0 P(<s>|am)=0 P(</s>|am)=1/2

P(I|Sam)=1/2 P(Sam|Sam)=0 P(am|Sam)=0 P(fly|Sam)=0 P(<s>|Sam)=0 P(</s>|Sam)=1/2

P(I|fly)=0 P(Sam|fly)=0 P(am|fly)=0 P(fly|fly)=0 P(<s>|fly)=0 P(</s>|fly)=1

P(I|</s>)=0 P(Sam|</s>)=1/3 P(am|</s>)=1/3 P(fly|</s>)=1/3 P(<s>|</s>)=0 P(</s>|</s>)=0

Dr. Wafa Zaal Almaaitah

Estimating N-Gram Probabilities
Example
• Unigrams: I, am, Sam, fly
P(I)=3/8 P(am)=2/8 P(Sam)=2/8 P(fly)=1/8

• Trigrams: There are 666=216 trigrams.

– Assume there are two tokens <s> <s> at the begining, and two tokens </s> </s> at the end.

P(I|<s> <s>)=2/3 P(Sam|<s> <s>)=1/3

Dr. Wafa Zaal Almaaitah

Estimating N-Gram Probabilities
Corpus: Berkeley Restaurant Project Sentences
• There are 9222 sentences in the corpus.
• Raw biagram counts of 8 words (out of 1446 words)

Dr. Wafa Zaal Almaaitah

Estimating N-Gram Probabilities
Corpus: Berkeley Restaurant Project Sentences

• Unigram counts:

• Normalize bigrams by unigram counts:

Dr. Wafa Zaal Almaaitah

Bigram Estimates of Sentence Probabilities
• Some other bigrams:
P(i|<s>)=0.25 P(english|want)=0.0011
P(food|english)=0.5 P(</s>|food)=0.68

• Compute the probability of sentence I want English food

P(<s> i want english food </s>)

Dr. Wafa Zaal Almaaitah

L3 LanguageModels
No ratings yet
L3 LanguageModels
118 pages
NLP Week 03
No ratings yet
NLP Week 03
33 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Module - 2 - Test Portion
No ratings yet
Module - 2 - Test Portion
33 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
No ratings yet
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
54 pages
Language Modelling
No ratings yet
Language Modelling
3 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Natural Language Processing - Language Modelling
No ratings yet
Natural Language Processing - Language Modelling
117 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
26 pages
N-Gram in NLP
No ratings yet
N-Gram in NLP
15 pages
NLP Techniques for Word Prediction
No ratings yet
NLP Techniques for Word Prediction
77 pages
N Grams - Nptel Notes
No ratings yet
N Grams - Nptel Notes
75 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
Language Models
No ratings yet
Language Models
34 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
N-Gram Models in NLP Explained
100% (1)
N-Gram Models in NLP Explained
4 pages
01 Introduction To N-Grams 8-41
No ratings yet
01 Introduction To N-Grams 8-41
4 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
No ratings yet
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
46 pages
Cs383 Lecture16 PDF
No ratings yet
Cs383 Lecture16 PDF
46 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
Language Models L3-6
No ratings yet
Language Models L3-6
49 pages
Language Models
No ratings yet
Language Models
59 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Probabilistic Theory in Natural Language Processing
No ratings yet
Probabilistic Theory in Natural Language Processing
15 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
Lecture 2 Language Model
No ratings yet
Lecture 2 Language Model
127 pages
Week 3
No ratings yet
Week 3
24 pages
Lecture5 Ngrams
No ratings yet
Lecture5 Ngrams
40 pages
01 Introduction To N-Grams 8-41
No ratings yet
01 Introduction To N-Grams 8-41
13 pages
Ngrams
No ratings yet
Ngrams
22 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
Module 2
No ratings yet
Module 2
26 pages
Unit 3-Notes AI
No ratings yet
Unit 3-Notes AI
36 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
Language Model
No ratings yet
Language Model
2 pages
Lec15 17 N Gram Language Model Part1
No ratings yet
Lec15 17 N Gram Language Model Part1
49 pages
Probabilistic Language Models
No ratings yet
Probabilistic Language Models
63 pages
NLP Week 02
No ratings yet
NLP Week 02
55 pages
N Grams
No ratings yet
N Grams
13 pages
08 NLP - N-Gram Language Models
No ratings yet
08 NLP - N-Gram Language Models
65 pages
VTT P422 HRA Methods For Probabilistic Safety Assessment
No ratings yet
VTT P422 HRA Methods For Probabilistic Safety Assessment
0 pages
Erikson's Psychosocial Development Theory
No ratings yet
Erikson's Psychosocial Development Theory
2 pages
Std 11 English Lit Exam 2020-21
No ratings yet
Std 11 English Lit Exam 2020-21
3 pages
Zwaan, Josephine - Starring Auto Tune - On The Mediation of Difference Through Standardization - 2011 (Bachelor Thesis)
No ratings yet
Zwaan, Josephine - Starring Auto Tune - On The Mediation of Difference Through Standardization - 2011 (Bachelor Thesis)
24 pages
My Husbands Wedding
No ratings yet
My Husbands Wedding
88 pages
A Murder in Hollywood: The Untold Story of Tinseltown's Most Shocking Crime Casey Sherman Download
100% (2)
A Murder in Hollywood: The Untold Story of Tinseltown's Most Shocking Crime Casey Sherman Download
94 pages
Math 8 4th Quarter
No ratings yet
Math 8 4th Quarter
5 pages
Soal Semester Genap k13 Kelas 11
No ratings yet
Soal Semester Genap k13 Kelas 11
6 pages
Advanced Planning Theory (URP 502)
No ratings yet
Advanced Planning Theory (URP 502)
30 pages
Clegg Hammer PDF
No ratings yet
Clegg Hammer PDF
36 pages
Managing Complexity
No ratings yet
Managing Complexity
18 pages
Liliosa Hilao
No ratings yet
Liliosa Hilao
3 pages
Donny Dwiputra Entanglement Driven Assisted Open Quantum Transport PDF
No ratings yet
Donny Dwiputra Entanglement Driven Assisted Open Quantum Transport PDF
16 pages
Academic Research Insights
No ratings yet
Academic Research Insights
76 pages
Charles Bukowski - Best Quotes
No ratings yet
Charles Bukowski - Best Quotes
3 pages
Socially Accepted
No ratings yet
Socially Accepted
131 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
15 pages
1113-2013 NCDRC Download
No ratings yet
1113-2013 NCDRC Download
2 pages
INTRODUCTION - Final
No ratings yet
INTRODUCTION - Final
17 pages
BSCS Biology Chapter 12 Study Guide
No ratings yet
BSCS Biology Chapter 12 Study Guide
8 pages
In The High Court of Judicature at Madras: Reserved On Delivered On
No ratings yet
In The High Court of Judicature at Madras: Reserved On Delivered On
23 pages
Research in Daily Life Subject - Grade 12
No ratings yet
Research in Daily Life Subject - Grade 12
4 pages
Bronfenbrenner Theory
No ratings yet
Bronfenbrenner Theory
1 page
Think Outside The Building Kanter en 36926
No ratings yet
Think Outside The Building Kanter en 36926
6 pages
WRITING PRESENTATION - Compressed
No ratings yet
WRITING PRESENTATION - Compressed
21 pages
Bio-02 CH#08 Genetics - KIPS
No ratings yet
Bio-02 CH#08 Genetics - KIPS
4 pages
Understanding Genetic Mutations
No ratings yet
Understanding Genetic Mutations
25 pages
If We Never Meet Again
No ratings yet
If We Never Meet Again
9 pages
Atmospheric Scientists & UFOs
No ratings yet
Atmospheric Scientists & UFOs
6 pages
3 SAG - Caregiving (Grade - Adol) NC II - FULL
No ratings yet
3 SAG - Caregiving (Grade - Adol) NC II - FULL
3 pages

Lecture 4 Language Models Updated

Uploaded by

Lecture 4 Language Models Updated

Uploaded by

N-gram Language Models

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

Probability of getting heads on the 1st toss = 0.5

Dr. Wafa Zaal Almaaitah

P(A) is between 0 and 1

• Conditional Probability -- Posterior Probability -- Likelihood

Dr. Wafa Zaal Almaaitah

P(A|B) = P(AB) / P(B)

P(B|A) = P(AB) / P(A)

Dr. Wafa Zaal Almaaitah

• So, we will have:

P(B | A)P( A) P( A | B)P(B)

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

• What is the probability of an upcoming word?:

• A model that computes either of these:

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

• Definition of Conditional Probabilities:

P(B|A) = P(A,B) / P(A) ➔ P(A,B) = P(A) P(B|A)

• Conditional Probabilities with More Variables:

Dr. Wafa Zaal Almaaitah

P(wn|w1…wn-1) = count(w1…wn-1wn) / count(w1…wn-1)

Dr. Wafa Zaal Almaaitah

P(wn|w1…wn-1) ≈ P(wn) unigram

Dr. Wafa Zaal Almaaitah

Unigrams -- P(w1n)   P(w )

Bigrams -- P(w1n)   P(w

4-grams -- P(w1n)   P(w

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

•Document 2: ["Java", "and", "Python", "are", "both", "programming", "languages"]

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

1. Tokenization: Tokenize the documents and the query into words.

•Document 2: ["Java", "and", "Python", "are", "both", "programming", "languages"]

Dr. Wafa Zaal Almaaitah

3. Calculate Conditional Probabilities:

Conditional probability formula:

Dr. Wafa Zaal Almaaitah

Calculation for Document 2:

Laplace smoothing, also known as add-one smoothing, is a technique used to handle

Dr. Wafa Zaal Almaaitah

By adding 1 to the numerator and V to the denominator, Laplace smoothing ensures

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

P(I|I)=0 P(Sam|I)=0 P(am|I)=2/3 P(fly|I)=1/3 P(<s>|I)=0 P(</s>|I)=0

P(I|am)=0 P(Sam|am)=1/2 P(am|am)=0 P(fly|am)=0 P(<s>|am)=0 P(</s>|am)=1/2

P(I|Sam)=1/2 P(Sam|Sam)=0 P(am|Sam)=0 P(fly|Sam)=0 P(<s>|Sam)=0 P(</s>|Sam)=1/2

P(I|fly)=0 P(Sam|fly)=0 P(am|fly)=0 P(fly|fly)=0 P(<s>|fly)=0 P(</s>|fly)=1

P(I|</s>)=0 P(Sam|</s>)=1/3 P(am|</s>)=1/3 P(fly|</s>)=1/3 P(<s>|</s>)=0 P(</s>|</s>)=0

Dr. Wafa Zaal Almaaitah

• Trigrams: There are 6*6*6=216 trigrams.

P(I|<s> <s>)=2/3 P(Sam|<s> <s>)=1/3

Dr. Wafa Zaal Almaaitah

Dr. Wafa Zaal Almaaitah

• Normalize bigrams by unigram counts:

Dr. Wafa Zaal Almaaitah

• Compute the probability of sentence I want English food

P(<s> i want english food </s>)

Dr. Wafa Zaal Almaaitah

You might also like

• Trigrams: There are 666=216 trigrams.