Year: 2024-2025
Spring Semester
Natural Language
Processing
Dr. Wafaa Samy
Dr. Hanaa Eissa
Introduction to Natural Language
Processing (NLP) (Part 2)
Lecture (2)
2
Contents
• Lexicon
• Part of Speech (POS)
• Part of Speech Tagging
• Ambiguous Words
• Ambiguity
• Alphabet, Strings, and Languages
• The Chomsky Hierarchy
Lexicon
• A lexicon is a repository for words (stems):
the set of available words of a language or
branch of knowledge (e.g. medical).
o It varies from language to language and
within a language according to the branch
of knowledge.
• Every word can be classified through a
lexical category such as article, noun,
verb, adjective, adverb, conjunction,
preposition, or pronoun.
• They may be also divided into sub- lexicon: the list of stems
categories. and affixes, together
o regular-nouns, irregular-singular-nouns, with basic information
irregular-plural-nouns, etc. about them (whether a
stem is a Noun stem or a
4
Verb stem, etc.).
Part of Speech (POS)
• Linguists group words of a language into classes (sets)
which show similar syntactic behavior, and often a typical
semantic type.
• These word classes are called syntactic or grammatical
categories, but more commonly part of speech (POS).
• Words are traditionally grouped into equivalence
classes called parts of speech (POS), word classes,
morphological classes, or lexical tags.
• In traditional grammars there were generally only a few
parts of speech (noun, verb, adjective, preposition,
adverb, conjunction, etc.).
• The part of speech for a word gives a significant amount
5
of information about the word and its neighbors.
Part of Speech Tagging
• Task: Given a string of words, identify the part of
speech for each word.
• Example: Word sequence.
o Input: Time flies like an arrow.
o Output: Tag sequence.
Time/NN flies/VB like/P an/DET arrow/NN
• What will help us to tag words with their parts-of-
speech?
o Identifying the parts of speech is a first step towards
syntactic analysis.
6
Example (1): Part of Speech Tagging
7
Example (2)
• Words in a sentence can be annotated – tagged –
with their POS categories.
• For instance, the simple sentence:
The big cat ate the gray mouse.
• Is annotated as:
o The/article big/adjective cat/noun ate/verb
the/article gray/adjective mouse/noun
8
Example (3)
• Tag the sentence using parts of speech you know:
The cat caught the big mouse
• The/article cat/noun caught/verb the/article
big/adjective mouse/noun.
9
Ambiguous Words
• Some words may only be nouns, e.g. arrow. The POS tagging
• Some words are ambiguous, e.g. like, flies. problem is to
• For instance: like determine the POS
o verb: I like the class. tag for a particular
o preposition: Time flies like an arrow. instance of a word.
• Words often have more than one POS: back JJ = Adjective
o The back door = JJ NN = Noun
o On my back = NN VB = Verb
o Promised to back the bill = VB
• Most of the time, the local context disambiguated the part of
speech.
• Local context:
o Two determiners rarely follow each other.
o Two base form verbs rarely follow each other.
10 o Determiner is almost always followed by adjective or noun.
Ambiguous Words (Cont.)
11
Ambiguity
• Ambiguity is a fundamental problem of Natural
Language Processing.
• Resolving ambiguity is a crucial goal.
• For example, Word Sense Resolution:
o Many words have many meanings or senses.
o We need to resolve which of the senses of an
ambiguous word is invoked in a particular use of the
word.
12
Example (4)
• Ambiguity in speech recognition can be illustrated with the
following sentence. Find at least 5 meanings of this sentence:
I made her duck
• Here are five different meanings this sentence could have:
1. I cooked waterfowl for her benefit (to eat).
2. I cooked waterfowl belonging to her.
3. I created the (plastic?) duck she owns.
4. I caused her to quickly lower her head or body.
5. I waved my magic wand and turned her into undifferentiated
waterfowl.
13
Example (4) (Cont.)
• These different meanings are caused by a number of
ambiguities:
1. The words duck and her are morphologically or
syntactically ambiguous in their part of speech.
o Duck can be a verb or a noun, while her can be a dative
pronoun or a possessive pronoun.
2. The word make is semantically ambiguous; it
can mean create or cook.
14
Example (4) (Cont.)
• I caused her to quickly lower her head or body.
o Lexical category: “duck” can be a noun or verb.
• I cooked waterfowl belonging to her.
• I cooked waterfowl for her benefit (to eat).
o Lexical category: “her” can be a possessive (“of her”) or
dative (“for her”) pronoun.
• I made the (plastic) duck statue she owns.
o Lexical Semantics: “make” can mean “create” or “cook”.
15
Example (4) (Cont.)
• Grammar: Make can be:
o Transitive: (verb has a noun direct object).
I cooked [waterfowl belonging to her]
o Di-transitive: (verb has 2 noun objects).
I made [her] (into) [undifferentiated waterfowl]
o Action-transitive: (verb has a direct object and
another verb).
I caused [her] [to move her body]
16
Phonetics!
• Furthermore, in spoken sentences, there is even
deeper kind of ambiguity; the first word could have
been eye or the second word maid:
o I mate or duck
o I’m eight or duck
o Eye maid; her duck I made her duck.
o Aye mate, her duck
o I maid her duck
o I’m aid her duck
o I mate her duck
o I’m ate her duck
o I’m ate or duck
o I mate or duck
17
Example (5)
• Discover all of the possible meanings of the sentence:
I sent her messages
• By giving a paraphrase of each interpretation. Identify whether the
different meanings arise from syntactic ambiguity, semantic
ambiguity, or morphological ambiguity.
a. I sent some messages to her.
b. I sent the messages written by her.
• There is ambiguity in the word her, since it can be a dative pronoun
or a possessive pronoun.
• There is also a syntactic ambiguity because the verb send is
syntactically ambiguous, it may be transitive with one object as in
(b) or Di-transitive with two objects as in (a).
18
Example (6)
• Discover all of the possible meanings of the sentence:
He drew one card
• By giving a paraphrase of each interpretation. Identify
whether the different meanings arise from structural
ambiguity, semantic ambiguity, or morphological ambiguity.
a. He created one card having a picture.
b. He chose one card from a set of cards.
• The different meanings are because of semantic ambiguity
because draw has many senses such as created a picture
and choose.
19
Alphabet
sigma
• An alphabet of a language “∑” is a finite and non-
empty set of symbols.
o We use the symbol ∑ (sigma) to denote an alphabet.
o These symbols can and will stand for bigger objects that
can have internal structure.
• Examples:
o Binary: ∑ = {0,1}
o All lower case letters: ∑ = {a, b, c, .. z}
o Alphanumeric: ∑ = {a-z, A-Z, 0-9}
o …
• Note: in NLP, we may use words instead of characters
for the alphabet.
20
Strings
• A string or word is a finite sequence of symbols chosen from ∑.
• Empty string is (or “epsilon”).
• Length of a string w, denoted by “|w|”, is equal to the
number of (non-) characters in the string:
o E.g. x = 010100 |x| = 6
o x = 01 0 1 00 |x| = 6
• xy = concatenation of two strings x and y.
• Powers of an Alphabet:
Let ∑ be an alphabet.
o ∑k = the set of all strings of length k.
o ∑* = ∑0 U ∑1 U ∑2 U … [ zero or more ]
21 o ∑+ = ∑1 U ∑2 U ∑ 3 U … [ one or more ]
Languages
• A language is a collection of strings constructed from alphabet
of symbols.
• L is a said to be a language over alphabet ∑, only if L ∑*
This is because ∑* is the set of all strings (of all possible length
including 0) over the given alphabet ∑.
• Examples: ∑ = {0,1}
1. Let L is the language of all strings consisting of n 0’s followed by
n 1’s: L = {, 01, 0011, 000111,…}
2. Let L is the language of all strings consisting of with equal
number of 0’s and 1’s: L = {, 01, 10, 0011, 1100, 1010, 1001,…}
• Ø denotes the Empty language. Let L = {}; Is L=Ø? No
•22 A grammar is a finite list of rules defining a language.
Example (7)
strings
23
The Membership Problem
• Given a string w ∑* and a language L over ∑,
decide whether or not w L.
Example:
Let ∑ = {0,1} and w = 100011
Is w the language of strings with equal
number of 0s and 1s?
Yes
24
The Chomsky Hierarchy
• A containment hierarchy of classes of formal languages.
Context- Context- Recursively-
Regular enumerable
free sensitive
(DFA) (TM)
(PDA) (LBA)
Accepted by Finite Recognized by
Automata (DFA or NFA) Turing Machine (TM)
Accepted by Push Accepted by Linear
25 Down Automata (PDA) Bound Automata (LBA)
The Chomsky Hierarchy (Cont.)
(TM)
Automata (LBA)
Automata (PDA)
Automata (DFA or NFA)
26