Natural Language Processing
NLP IS INTER DISCIPLINARY
Grammar in NLP:Introduction
• The concept of Grammar with its types that
are heavily used while we working on the level
of Syntactic Analysis in NLP.
1. What is Grammar?
2. Different Types of Grammar in NLP
• Context-Free Grammar (CFG)
• Constituency Grammar (CG)
• Dependency Grammar (DG)
What is Grammar?
• Grammar is defined as the rules for forming well-
structured sentences.
• While describing the syntactic structure of well-
formed programs, Grammar plays a very essential
and important role.
• In simple words, Grammar denotes syntactical
rules that are used for conversation in natural
languages.
• The theory of formal languages is not only
applicable here but is also applicable in the fields
of Computer Science mainly in programming
languages and data structures.
What is Grammar?
• For Example, in the ‘C’ programming language,
the precise grammar rules state how functions are
made with the help of lists and statements.
• Mathematically, a grammar G can be written as a
4-tuple (N, T, S, P) where,
• N or VN = set of non-terminal symbols, or
variables.
• T or ∑ = set of terminal symbols.
• S = Start symbol where S ∈ N
• P = Production rules for Terminals as well as Non-
terminals.
• It has the form α → β, where α and β are strings on
VN ∪ ∑ and at least one symbol of α belongs to VN
Context-Free Grammar (CFG)
• A context-free grammar, which is in short
represented as CFG, is a notation used for
describing the languages and it is a superset
of Regular grammar which you can see from
the following diagram:
• CFG consists of a finite set of grammar rules
having the following four components
• Set of Non-Terminals
• Set of Terminals
• Set of Productions
• Start Symbol
Context-Free Grammar (CFG)
• Set of Non-terminals
• It is represented by V.
• The non-terminals are syntactic
variables that denote the sets of strings,
which helps in defining the language
that is generated with the help of
grammar.
• Set of Terminals
• It is also known as tokens and
represented by Σ.
• Strings are formed with the help of the
basic symbols of terminals.
Context-Free Grammar (CFG)
• Set of Productions
• It is represented by P.
• The set gives an idea about how the
terminals and nonterminals can be
combined.
• Every production consists of the
following components:
Context-Free Grammar (CFG)
• Every production consists of the
following components:
• Non-terminals,
• Arrow,
• Terminals (the sequence of terminals).
• The left side of production is called non-
terminals while the right side of
production is called terminals.
• Start Symbol
• The production begins from the start
symbol.
• It is represented by symbol S.
• Non-terminal symbols are always
designated as start symbols.
Constituency Grammar (CG)
• It is also known as Phrase structure grammar.
• It is called constituency Grammar as it is
based on the constituency relation.
• It is the opposite of dependency grammar.
• Before deep dive into the discussion of CG,
let’s see some fundamental points about
constituency grammar and constituency
relation.
Constituency Grammar (CG)
• All the related frameworks view the
sentence structure in terms of constituency
relation.
• To derive the constituency relation, we take
the help of subject-predicate division of
Latin as well as Greek grammar.
• Here we study the clause structure in terms
of noun phrase NP and verb phrase VP.
Constituency Grammar (CG)
For Example,
Sentence: This tree is illustrating the constituency relation
Constituency Grammar (CG)
• In Constituency Grammar, the constituents
can be any word, group of words, or phrases
and the goal of constituency grammar is to
organize any sentence into its constituents
using their properties.
• To derive these properties we generally take
the help of:
• Part of speech tagging,
• A noun or Verb phrase identification, etc
For Example, constituency grammar can
organize any sentence into its three
constituents- a subject, a context, and an object.
Sentence: <subject> <context> <object>
Constituency Grammar (CG)
• These three constituents can take different
values and as a result, they can generate
different sentences.
• For Example, If we have the following
constituents, then
Constituency Grammar (CG)
• Example sentences that we can be generated
with the help of the above constituents are:
Constituency Grammar (CG)
• Now, let’s look at another view of constituency
grammar is to define their grammar in terms
of their part of speech tags.
• Say a grammar structure containing a
which corresponds to the same sentence – “The
dogs are barking in the park”
Dependency Grammar (DG)
• It is opposite to the constituency grammar and
is based on the dependency relation.
• Dependency grammar (DG) is opposite to
constituency grammar because it lacks phrasal
nodes.
• some fundamental points about Dependency
grammar and Dependency relation.
• In Dependency Grammar, the words are
connected to each other by directed links.
• The verb is considered the center of the
clause structure.
• Every other syntactic unit is connected to
the verb in terms of directed link. These
Dependency Grammar (DG)
• For Example,
Dependency Grammar (DG)
1. Dependency Grammar states that words of a
sentence are dependent upon other words of the
sentence.
• For Example, in the previous sentence which
we discussed in CG, “barking dog” was
mentioned and the dog was modified with the
help of barking as the dependency adjective
modifier exists between the two.
Dependency Grammar (DG)
2. It organizes the words of a sentence according
to their dependencies.
One of the words in a sentence behaves as a root
and all the other words except that word itself
are linked directly or indirectly with the root
using their dependencies.
These dependencies represent relationships
among the words in a sentence and dependency
grammars are used to infer the structure and
semanticxyz
Sentence: dependencies between
is the largest the
community ofwords.
data scientists
and provides the best resources for understanding data and
analytics
• For Example, Consider the following sentence:
Dependency Grammar (DG)
• The dependency tree of the above sentence is shown below
• In the above tree, the root word is “community” having
NN as the part of speech tag and every other word of
this tree is connected to root, directly or indirectly, with
the help of dependency relation such as a direct object,
direct subject, modifiers, etc.
• These relationships define the roles and functions of
each word in the sentence and how multiple words are
connected together.
Dependency Grammar (DG)
• We can represent every dependency in the form of a
triplet which contains a governor, a relation, and a
dependent,
Relation : ( Governor, Relation, Dependent
)
• which implies that a dependent is connected to the
governor with the help of relation, or in other words,
they are considered the subject, verb, and object
respectively.
Sentence:
• For Analytics
Example, Vidhya
Consider theisfollowing
the largest
samecommunity
sentenceof
data scientists
again:
< Analyticsvidhya> <is> <the largest community of data
scientists>
• Then, we separate the sentence in the following
manner:
Dependency Grammar (DG)
• Now, let’s identify different components in the above
sentence:
• Subject: “Analytics Vidhya” is the subject and is
playing the role of a governor.
• Verb: “is” is the verb and is playing the role of the
relation.
• Object: “the largest community of data scientists”
is the dependent or the object.
Dependency Grammar (DG)
• Some use cases of Dependency grammars are as
follows
• Named Entity Recognition
• It can be used to solve the problems related to
named entity recognition (NER).
• Question Answering System
• It can be used to understand the relational and
structural aspects of question-answering systems.
Dependency Grammar (DG)
• Some use cases of Dependency grammars are as
follows
• Coreference Resolution
• It can also be used in coreference resolutions in
which the task is to map the pronouns to the
respective noun phrases.
• Text summarization and Text classification
• It can also be used for text summarization
problems and they are also used as features for
text classification problems.
Sentence Structure In NLP
What Do You Mean by Sentence Structure?
• Sentence structure is a grammatical component that
tells you exactly where and how each component of a
sentence should be placed in order to blend and make
sense.
• The Collins Dictionary defines sentence structure as
“the grammatical arrangement of words in sentences.”
• In other words, the sentence structure is what defines
the way a sentence will look and sound.
Sentence Structure In NLP
• Sentence structure refers to how different components
of a sentence (words, phrases, clauses) are arranged
and related to one another.
a. Constituent Structure (Phrase Structure)
• This type of structure breaks a sentence into its
constituent parts (phrases). Constituents are groups of
words that function as a unit within a sentence.
Constituency Grammar: NLP
• The fundamental notion underlying the idea of the
constituency is that of abstraction — groups of words
behaving as single units, or constituents.
• The most widely used formal system for modelling
constituent structure in English and other natural
languages are the Context-Free Grammar or CFG.
• Context-free grammars are also called Phrase-
Structure Grammars.
• A context-free grammar consists of a set of rules or
productions, each of which expresses the ways that
symbols of the language can be grouped and ordered
together, and a lexicon of words and symbols.
Constituency Grammar: NLP
• The fundamental notion underlying the idea of the
constituency is that of abstraction — groups of words
behaving as single units, or constituents.
• The most widely used formal system for modelling
constituent structure in English and other natural
languages are the Context-Free Grammar or CFG.
• Context-free grammars are also called Phrase-
Structure Grammars.
• A context-free grammar consists of a set of rules or
productions, each of which expresses the ways that
symbols of the language can be grouped and ordered
together, and a lexicon of words and symbols.
Constituency Grammar: NLP
• For example, the following productions NP express
that an NP (or noun phrase) can be composed of either
a proper noun or a determiner (Det) followed by a
Nominal; a Nominal, in turn, can consist of one or more
Nouns.
• NP → Det Nominal
• NP → ProperNoun
• Nominal → Noun |Nominal Noun
• The symbols that are used in a CFG are divided into
two classes
Constituency Grammar: NLP
The symbols that are used in a CFG are divided into two classes
• The symbols terminal that corresponds to words in the
language (“the”, “nightclub”) are called terminal symbols.
• Lexicon are a set of rules that introduce these terminal
symbols.
• The non-terminal symbols that express abstractions over
these terminals are called non-terminals.
• In each context-free rule, the item to the right of the arrow ( →)
is an ordered list of one or more terminals and non-
terminals; to the left of the arrow is a single non-terminal
symbol expressing some cluster or generalization.
• In Lexicon the non-terminal associated with each of its words is
its part of speech.
Constituency Grammar: NLP
CFG also determines its start symbol like any other
language and it is denoted by S which is regarded as a
sentence node.
A parse tree for “a flight”.
Constituency Grammar: NLP
• The following rule expresses the fact that a sentence can consist
of a noun phrase followed by a verb phrase
• S → NP VP I prefer a morning flight
• Let’s talk a bit about verb phrase before we dive deep into it.
• A verb phrase in English consists of a verb followed by assorted
other things for example, one kind of verb phrase consists of a
verb followed by a noun phrase:
• VP → Verb NP prefer a morning flight
• Or the verb may be followed by a noun phrase and a
prepositional phrase:
• VP →Verb NP PP leave Boston in the morning
• Or the verb phrase may have a verb followed by a prepositional
phrase alone:
• VP →Verb PP leaving on Thursday
Constituency Grammar: NLP
• A prepositional phrase generally has a preposition
followed by a noun phrase.
• PP →Preposition NP from Los Angeles
• Now we will see the end to end process of defining
grammar and then try and generate a sentence with
that grammar.
• Since English Grammar is very vast, we take a subset
for illustration purposes. We will call this CFG as L₀.
• Lexicons for L₀
Constituency Grammar: NLP
Lexicons for L₀
Constituency Grammar: NLP
Grammar Rules for L₀
The grammar for L0, with example phrases for each rule.
Constituency Grammar: NLP
The parse tree for “I prefer a morning flight” according to
grammar L0
Constituency Grammar: NLP
Noun Phrase
• Our L₀ grammar introduced three of the most frequent
types of noun phrases that occur in English: pronouns,
proper nouns and the NP → Det Nominal
construction.
• The main focus is on the last kind since this is where
syntactic complexity resides.
• These noun phrases consist of a head, the central noun
in the noun phrase, along with various modifiers that
can occur before or after the head noun.
Constituency Grammar: NLP
The Determiner
• Noun Phrase can begin with simple lexical determiners for
example:-
• a stop, the flights , this flight
• But determiners role can be also be fulfilled by a possessive
expression consisting of a noun phrase followed by an ’s as a
possessive marker. Example:-
• United’s flight
• United’s pilot’s union
• Denver’s mayor’s mother’s cancelled flight
Constituency Grammar: NLP
Det → NP ’s
• Since the rule is recursive in nature we can easily model the last
tow examples which have a sequence of possessive expressions.
• In some circumstances, determiners are optional when they are
modifying a noun which is plural. Example:-
• Show me flights from San Francisco to Denver on weekdays.
The Nominal
• The nominal follows after determiners and contains any pre and
post head noun modifiers. In its simplest form, it consists of a
single noun.
• Nominal → Noun
Sentence Structure In NLP
• For example, the sentence "The cat sleeps on the mat"
can be broken into:
• S (Sentence)
• NP (Noun Phrase) → "The cat“
• VP (Verb Phrase) → "sleeps on the mat"
• V (Verb) → "sleeps"
• PP (Prepositional Phrase) → "on the mat"
• P (Preposition) → "on"
• NP (Noun Phrase) → "the mat"
• This structure is usually represented as a tree or a
diagram that shows how each part of the sentence is
related.
Sentence Structure In NLP
b. Linear Structure (Word Order)
• In many languages, especially those with relatively
fixed word order like English, sentence meaning can
depend heavily on the order in which words appear.
• English typically follows a Subject-Verb-Object
(SVO) order.
• For example:
• "John (subject) kicks (verb) the ball (object).“
• However, some languages like Japanese use Subject-
Object-Verb (SOV), and others like Russian have
flexible word order but rely on case markers to indicate
roles.
Sentence Structure In NLP
c. Syntactic Trees
• Syntactic trees, often used in both CFG and
dependency grammar, represent the structure of
sentences hierarchically, showing how different parts
of a sentence are connected.
• Constituent Tree: This tree structure represents
how a sentence is divided into different constituent
parts (noun phrases, verb phrases, etc.).
• Dependency Tree: In a dependency tree, words
are linked by directed edges indicating syntactic
relationships, with the root (typically the main verb)
being the central node.
Sentence Structure In NLP
• In computational linguistics, the term parsing refers to
the task of creating a parse tree from a given
sentence.
• A parse tree is a tree that highlights the syntactical
structure of a sentence according to a formal
grammar, for example by exposing the relationships
between words or sub-phrases. Depending on which
type of grammar we use, the resulting tree will have
different features.
Sentence Structure In NLP
• In computational linguistics, the term parsing refers
to the task of creating a parse tree from a given
sentence.
• A parse tree is a tree that highlights the syntactical
structure of a sentence according to a formal
grammar, for example by exposing the relationships
between words or sub-phrases.
• Depending on which type of grammar we use, the
resulting tree will have different features.
• Constituency and dependency parsing are two
methods that use different types of grammars.
• Since they are based on totally different assumptions,
the resulting trees will be very different.
• Although, in both cases, the end goal is to extract
syntactic information.
Sentence Structure In NLP
• To begin, let’s start by analyzing the constituency
parse tree.
• Constituency Parsing: The constituency parse tree is
based on the formalism of context-free grammars.
• In this type of tree, the sentence is divided into
constituents, that is, sub-phrases that belong to a
specific category in the grammar.
• In English, for example, the phrases “a dog”, “a
computer on the table” and “the nice sunset” are all
noun phrases, while “eat a pizza” and “go to the
beach” are verb phrases.
Sentence Structure In NLP
• To begin, let’s start by analyzing the constituency parse
tree.
• The grammar provides a specification of how to build
valid sentences, using a set of rules.
• As an example, the rule V P → V N P means that we can
form a verb phrase (VP) using a verb (V) and then a noun
phrase (NP).
• While we can use these rules to generate valid
sentences, we can also apply them the other way
around, in order to extract the syntactical structure of a
given sentence according to the grammar.
Sentence Structure In NLP
• Let’s dive straight into an example of a constituency
parse tree for the simple sentence, “I saw a fox”:
Sentence Structure In NLP
• A constituency parse tree always contains the
words of the sentence as its terminal nodes.
• Usually, each word has a parent node containing its
part-of-speech tag (noun, adjective, verb, etc…),
although this may be omitted in other graphical
representations.
• All the other non-terminal nodes represent the
constituents of the sentence and are usually one of
verb phrase, noun phrase, or prepositional phrase (PP).
Sentence Structure In NLP
• In this example, at the first level below the root, our
sentence has been split into a noun phrase, made up
of the single word “I”, and a verb phrase, “saw a fox”.
• This means that the grammar contains a rule like S →
NP VP, meaning that a sentence can be created with
the concatenation of a noun phrase and a verb phrase.
• Similarly, the verb phrase is divided into a verb and
another noun phrase.
• As we can imagine, this also maps to another rule in
the grammar.
Sentence Structure In NLP
• To sum things up, constituency parsing creates trees
containing a syntactical representation of a sentence,
according to a context-free grammar.
• This representation is highly hierarchical and divides
the sentences into its single phrasal constituents.
Dependency Parsing
• As opposed to constituency parsing, dependency
parsing doesn’t make use of phrasal
constituents or sub-phrases.
• Instead, the syntax of the sentence is expressed in
terms of dependencies between words — that is,
directed, typed edges between words in a graph.
Dependency Parsing
• More formally, a dependency parse tree is a graph G = (V,
E) where the set of vertices V contains the words in the
sentence, and each edge in E connects two words.
• The graph must satisfy three conditions:
• There has to be a single root node with no incoming edges.
• For each node v in V, there must be a path from the root R to v.
• Each node except the root must have exactly 1 incoming edge.
• Additionally, each edge in E has a type, which defines the
grammatical relation that occurs between the two words.
Dependency Parsing
• what the previous example looks like if we perform dependency
parsing:
Dependency Parsing
• What the previous example looks like if we perform
dependency parsing:
• As we can see, the result is completely different. With this
approach, the root of the tree is the verb of the sentence,
and edges between words describe their relationships.
• For example, the word “saw” has an outgoing edge of type
nsubj to the word “I”, meaning that “I” is the nominal
subject of the verb “saw”. In this case, we say that “I”
depends on “saw”.