Natural Language
Processing
Lecture 5: Introduction to Syntax and
Formal Languages.
11/8/2020
COMS W4705
Yassine Benajiba
Sentences:
the good, the bad, and the ugly
• Some good sentences:
• the boy likes a girl
• the small girl likes a big girl
• a very small nice boy sees a very nice boy
• Some bad sentences:
• the boy the girl likes
• small boy likes nice girl
• Ugly word salad: very like nice the girl boy
Syntax as an Interface
• Syntax can be seen as the interface between morphology
(structure of words) and semantics.
• Why treat syntax separately from semantics?
• Can judge if a sentence is grammatical or not, even if it
doesn’t make sense semantically.
Colorless green ideas sleep furiously.
*Sleep ideas furiously colorless green.
Key Concepts of Syntax
• Constituency and Recursion.
• Dependency.
• Grammatical Relations.
• Subcategorization.
• Long-distance dependencies.
Constituents
• A constituent is a group of words that behave as a single unit (within a
hierarchical structure).
• Noun-Phrase examples:
• [they], [the woman], [three parties from Brooklyn],
[a high-class spot such as Mindy’s], [the horse raced past the barn]
• Noun phrases can appear before verbs (among other things) and
they must be complete:
• *from arrive…
*the is ….
*spot sat….
Constituency Tests
• On September seventeenth, I’d like to fly to New York.
• I’d like to fly to New York on September seventeenth.
• I’d like to fly on September seventeenth to New York.
• *On I’d like to fly to New York September seventeeth.
• *On September I’d like to fly seventeenth to New York.
More Constituency Tests
• There is a great number of constituency tests. They typically involve moving
constituents around or replacing them.
• Topicalization:
• I won’t eat that pizza That pizza, I won’t eat *pizza I won’t eat that
• Pro-form Substitution:
• I don’t know the man who sent flowers. I don’t know him.
*I don’t know him flowers.
• Wh-question test.
• Where would you like to fly on September seventeeth?
• When would you like to fly to New York?
Sentence Structure as Trees
• [the tall girl likes a few very friendly boys]
• [[the tall girl] likes [a few very friendly boys]]
• [[the] tall girl] likes [[a few] [very friendly] boys]]
the tall girl likes a few very friendly boys
Constituent Labels
• Choose constituents so each one has one non-bracketed word:
the head.
• Category of Constituent: XP, where X is the part-of-speech of the
head
NP, VP, AdjP, AdvP, DetP
NP
NP
NP
QuantP AdjP
DetP
Det N V Det Quant Adv Adj N
the girl likes a few very friendly boys
Constituent Labels
• Choose constituents so each one has one non-bracketed word:
the head.
• Category of Constituent: XP, where X is the part-of-speech of the
head
NP, VP, AdjP, AdvP, DetP
S
VP
NP
NP
NP
QuantP AdjP
DetP
Det N V Det Quant Adv Adj N
the girl likes a few very friendly boys
Review: Constituency
The students easily completed the difficult NLP homework.
Which constituents can you identify? What tests could you use?
Recursion in Language
• One of the most important attributes of Natural Languages is
that they are recursive.
• He made pie
[with apples [from the orchard [near the farm [in …]]]]
• [The mouse [the cat [the dog chased]] ate] died.
• There are infinitely many sentences in a language, but in
predictable structures.
• How do we model the set of sentences in a language and their
structure?
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat
NP →DN N → tail
NP → NP PP N → student
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail
NP → NP PP N → student
NP
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail
NP → NP PP N → student
NP
D N
the student
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail NP
NP → NP PP N → student
NP
D N V
the student saw
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail NP
NP → NP PP N → student
PP
NP NP
D N V
the student saw
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail NP
NP → NP PP N → student
PP
NP NP NP
D N V D N P
the student saw the cat with
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail NP
NP → NP PP N → student
PP
NP NP NP
D N V D N P D N
the student saw the cat with the tail
Context Free Grammars
• A context free grammar is defined by:
• Set of terminal symbols Σ.
• Set of non-terminal symbols N.
• A start symbol S ∈ N.
• Set R of productions of the form A → β,
where A ∈ N and β ∈ (Σ ∪ N)*, i.e. β is a string of
terminals and non-terminals.
Language of a CFG
• Given a CFG G=(N, Σ, R,S):
• Given a string αAγ, where A ∈ N, we can derive αβγ if
there is a production A → β ∈ R.
• α⇒β means that G can derive β from α in a single step.
• α⇒*β means that G can derive β from α in a finite
number of steps.
• The language of G is defined as the set of all terminal
strings that can be derived from the start symbol.
Σ
Derivations and Derived
Strings
• CFG is a string rewriting formalism, so the derived objects
are strings.
• A derivation is a sequence of rewriting steps.
• CFGs are context free: applicability of a rule depends only on
the nonterminal symbol, not on its context.
• Therefore, the order in which multiple non-terminals in a
partially derived string are replaced does not matter.
We can represent identical derivations in a derivation tree.
• The derivation tree implies a parse tree.
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat
NP →DN N → tail
NP → NP PP N → student
Derived String:
NP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail
NP → NP PP N → student
Derived String:
NP PP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N
NP → NP PP N → student
Derived String:
the student PP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
Derived String:
the student P NP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
Derived String:
the student with NP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
Derived String:
the student with NP PP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D N
Derived String:
the student with the cat PP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D N P NP
Derived String:
the student with the cat with NP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D N P NP
Derived String:
NP PP
the student with the cat with NP PP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D N P NP
Derived String:
NP PP
the student with the cat with the tail PP
…
D N
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D N P NP
Derived String:
NP PP
the student with the cat with the tail PP
…
D N
Regular Grammars
• A regular grammar is defined by:
• Set of terminal symbols Σ.
• Set of non-terminal symbols N.
• A start symbol S ∈ N.
• Set R of productions of the form A → aB, or A → a
where A,B ∈ N and a ∈ Σ.
Finite State Automata
• Regular grammars can be implemented as finite state
automata.
NP → the N with
N → student PP
N → cat PP student
N → tail PP the cat
NP N PP end
PP → with NP Ɛ
PP → Ɛ tail
• The set of all regular languages is strictly smaller than the
set of context-free languages.
Center Embeddings
S
the mouse CP died
the cat CP ate
the dog chased
• Problem: Regular grammars cannot capture
long-distance dependencies.
• This example follows the pattern anbn.
Can show that is language is not regular
(using the “pumping lemma”).
Linguistically, this is not a perfect analysis.
Is Natural Language Context
Free?
• Probably not. An example from Dutch:
“…because Wim saw Jan help Marie teach the children to swim”
• Context Free Grammars cannot describe crossing dependencies.
For example, it can be shown that
anbmcndm
is not a context free language.
Complexity Classes
recursively
enumerable languages
context sensitive
languages
“mildly” context
sensitive languages
context free
languages
regular
language
s
This is sometimes called
the “Chomsky Hierarchy”.
Formal Grammar and
Parsing
• Formal Grammars are used in linguistics, NLP,
programming languages.
• We want to build a compact model that describes a
complete language.
• Need efficient algorithms to determine if a sentence is in
the language or not (recognition problem).
• We also want to recover the structure imposed by the
grammar (parsing problem).
Acknowledgments
• Some slides by Kathy McKeown and Owen Rambow.