0% found this document useful (0 votes)

41 views37 pages

Lecture15 Parsing

The document provides an overview of parsing in Natural Language Processing, detailing top-down and bottom-up parsing methods, their advantages and disadvantages, and specific algorithms like the Earley and CYK parsers. It also discusses probabilistic parsing and its benefits, particularly in the context of Indian languages and their unique grammatical structures. References to foundational works in the field are included, highlighting the complexity of parsing in diverse linguistic contexts.

Uploaded by

shwetavairagi.cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views37 pages

Lecture15 Parsing

Uploaded by

shwetavairagi.cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

|| Jai Sri Gurudev||

Sri Adichunchanagiri Shikshana Trust (R)

SJB INSTITUTE OF TECHNOLOGY

(Affiliated to Visvesvaraya Technological University, Belagavi& Approved by AICTE, New Delhi.)
No. 67, BGS Health & Education City, Dr. Vishnuvardhan Road Kengeri, Bengaluru – 560 060

Subject: Natural Language Processing(18CS743)

By
CHETAN R, Assistant Professor
Semester / Section: 7A and B

Department of Information Science & Engineering

Aca. Year: ODD SEM /2021-22
PARSING

2
Overview
 The task that uses the rewrite rules of a grammar to either generate a
particular sequence of words or reconstruct its derivation is termed
as parsing.
 The following constraints guide the search process:
1. Input: Words in the input sentence. A valid parse is one that covers
all the words in a sentence. These words must constitute the leaves of
the final parse tree.
2. Grammar: The root of the final parse tree must be the start symbol
of the grammar.
3
Top-down parsing

4
Top-down search space

5
Bottom-up parsing

6
Pros and Cons
 The top-down parser search starts generating trees with the start
symbol of the grammar. It never wastes time exploring a tree
leading to a different root.
 It wastes time exploring S trees that eventually result in words that
are inconsistent with the input.
 The bottom up parser never explores a tree that does not match
the input.
 It wastes time generating trees that have no chance of leading to an
7 S-rooted tree.
Basic Top-Down Parser

8
Derivation using top-down, depth first algorithm

9
Disadvantages
1. Left Recursion: which causes the search to get stuck in an infinite loop.
2. Structural Ambiguity: which occurs when a grammar assigns more than one parse
to a sentence.
3. Attachment Ambiguity: If a constituent fits more than one position in a parse tree.
4. Coordination Ambiguity: Occurs when its is nit clear which phrases are being
combined with a conjunction like ‘and’.
5. Local Ambiguity: Occurs when certain parts of a sentence are ambiguous.
6. Repeated Parsing: Parser often builds valid trees for portions of the input that it
discards during backtracking.

10
EARLEY PARSER
 It implements an efficient parallel top-down search using
dynamic programming.
 It builds a table of sub-trees for each of the constituents in the
input.
 The most important component of this algorithm is the Earley
Chart that has n+1 entries, where n is the number of words in
the input.
 The algorithm makes a left to right scan of input to fill the
11 elements in this chart.
State Information
1. A sub-tree corresponding to a grammar rule.
2. Information about the program made in completing the
sub-tree.
3. Position of the sub-tree with respect to input.
A state is represented as a dotted rule and a pair of numbers
representing starting position and the position of dot.
A  X1 …  C … Xm, [i,j]

12
Algorithm

13
Predictor
 Generates new states representing potential expansion of the
non-terminal in the left-most derivation.
 It is applied to every state that has a non-terminal to the right of
the dot, when the category of that non-terminal is different
from the part-of-speech.
 If A  X1 …  C … Xm, [i,j] then for every rule of the form
B the operation adds to chart[j], the state:
B  , [j,j]
14
Example
 When the generating state is S   NP VP, [0,0] the predictor
adds the following states to chart[0]:
 NP   Det Nominal, [0,0]
 NP   Noun, [0,0]
 NP   Pronoun, [0,0]
 NP   Det Noun PP, [0,0]

15
Scanner
 Scanner is used when a state has a part of speech category to
the right of the dot.
 It examines the input to see if the part-of-speech appearing to
the right of the dot matches one of the part-of-speech
associated with the current input.
 If Yes, then it creates a new state using the rule.
 If the state is A  …  a …, [i,j] and ‘a’ is associated with wj
then, it adds a  … wj  [i,j] to chart [j+1].
16
Completer
 Completer is used when the dot reaches the right end of the
rule.
 The presence of such a state signifies successful completion of
the parse of some grammatical category.
 If A  … , [j,k], then the computer adds
B  … A  … [i, k] to chart [k] for all states
B  … A  … , [i, j] in chart [j].

17
18
CYK Parser
 Cocke-Younger-Kasami is a dynamic programming parsing
algorithm.
 It follows a bottom-up approach in parsing. Builds the parse tree
incrementally. Each entry in the table is based on the previous
entries.
 The CYK algorithm assumes the grammar to be in chomsky normal
form (CNF). A CFG is in CNF if all the rules are of only two forms.
A BC
19 A  W, where W is a word.
CYK Algorithm

20
Example
Sentence: “The girl wrote an essay”

21
Probabilistic Parsing
 A statistical parser works by assigning probabilities to possible parses
of a sentence and returning the most likely parse as the final one.
 More formally given a grammar G, sentence s and a set of possible
parse trees of s which we denote by (s), a probabilistic parser finds
the most likely parse ` of s as follows:

22
Advantages
1. Probabilistic parser offers is removal of ambiguity for
parsing.
2. The search becomes more efficient.

23
Probabilistic Context Free Grammar

24
Example

25
Probability Estimation

26
Estimating Rule Probability

27
Two Parse Trees

28
Parsing PCFGs

29
Indian Languages
The majority of the indian languages are free word order.
The order of the sentence can be changed without leading
to a grammatically incorrect sentence.

30
Contd..
 Extensive and productive use of complex predicates (CPs) is another property that
most Indian languages have in common.
 A complex predicate combines a light verb, noun, or adjective to produce a new
verb.

31
Parsing Indian Languages
 Bharti and Sangal described an approach for parsing of Indian
languages based on Paninian grammar formalism. It has 2 stages:
1. Is responsible for identifying word groups.
2. Assigning parse structure to the input sentence.

32
Karaka Chart

33
Constraint Graph

34
Constraints

35
Parse of the sentence

36
References
1. Bharti, Akshar and Rajeev Sangal, 1990, ‘A Karaka-based
approach parsing of Indian Languages’, Proceedings of the 13th
Conference on Computational Linguistics, Association for
Computational Linguistics, 3.
2. Chomsky, N., 1957, Syntactic Structures, Mouton, The Hague.

NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
Natural Language Processing UNIT 2
No ratings yet
Natural Language Processing UNIT 2
32 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
M2 Compiler Design
No ratings yet
M2 Compiler Design
51 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Top-Down Parsing Techniques
No ratings yet
Top-Down Parsing Techniques
73 pages
CS6109 Module 5
No ratings yet
CS6109 Module 5
117 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
67 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Compiler Syntax Analysis Guide
No ratings yet
Compiler Syntax Analysis Guide
42 pages
Module-2 1
No ratings yet
Module-2 1
51 pages
Unit 2
No ratings yet
Unit 2
22 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
2024 CD-Ch03 Syntaxx Analysis
No ratings yet
2024 CD-Ch03 Syntaxx Analysis
28 pages
NLP M3 SPP
No ratings yet
NLP M3 SPP
53 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
78 pages
Lexical Analysis and Parsing Techniques
No ratings yet
Lexical Analysis and Parsing Techniques
62 pages
CD Unit 3
No ratings yet
CD Unit 3
76 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
Chapter 3 (Part 1)
No ratings yet
Chapter 3 (Part 1)
33 pages
Compiler Designnotes
No ratings yet
Compiler Designnotes
18 pages
CD - CH3 - Syntax Analysis (Parsing)
No ratings yet
CD - CH3 - Syntax Analysis (Parsing)
109 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
Unit - 3 Syntax Analyzer
No ratings yet
Unit - 3 Syntax Analyzer
43 pages
Top Down PDF
No ratings yet
Top Down PDF
49 pages
Lecture 13
No ratings yet
Lecture 13
35 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
Basic Parsing Techniques - Parsing
No ratings yet
Basic Parsing Techniques - Parsing
20 pages
Unit - 2 NLP - R20
No ratings yet
Unit - 2 NLP - R20
21 pages
Compiler Construction Lecture 12 Predictive Parsing-Step1
No ratings yet
Compiler Construction Lecture 12 Predictive Parsing-Step1
24 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Chart Parsers PDF
No ratings yet
Chart Parsers PDF
7 pages
Name: Gapkwi S. Reuel REG NO: U21DLCS10193 Course: Cosc 408: A. What Is Analytic Grammar?
No ratings yet
Name: Gapkwi S. Reuel REG NO: U21DLCS10193 Course: Cosc 408: A. What Is Analytic Grammar?
8 pages
CD Unit-3 Part-1
No ratings yet
CD Unit-3 Part-1
99 pages
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
34 pages
Compiler CH-3
No ratings yet
Compiler CH-3
6 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
36 pages
Chapter 3 Syntax Analyzer1
No ratings yet
Chapter 3 Syntax Analyzer1
58 pages
Unit 3
No ratings yet
Unit 3
19 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
54 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
68 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Unit - Ii Topdown Parsing 1. Context-Free Grammars: Definition
No ratings yet
Unit - Ii Topdown Parsing 1. Context-Free Grammars: Definition
26 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
MODULE 3 Syntax Analysis
100% (1)
MODULE 3 Syntax Analysis
182 pages
Thuật toán NLP
No ratings yet
Thuật toán NLP
57 pages
Module-2 ch-4
No ratings yet
Module-2 ch-4
32 pages
CH03
No ratings yet
CH03
57 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
56 pages
Presented by Jyoti Thakur
No ratings yet
Presented by Jyoti Thakur
31 pages
Complier Construction (Final)
No ratings yet
Complier Construction (Final)
8 pages
Context-Free Grammars Explained
No ratings yet
Context-Free Grammars Explained
21 pages
Syntax Analysis and Parsing Guide
No ratings yet
Syntax Analysis and Parsing Guide
105 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Class Three
No ratings yet
Class Three
74 pages
Practicals
No ratings yet
Practicals
27 pages
Cyber
No ratings yet
Cyber
12 pages
Blockchain
No ratings yet
Blockchain
13 pages
Os Lab Manual 23-24
No ratings yet
Os Lab Manual 23-24
49 pages
Leveraged Learning Guide for Experts
No ratings yet
Leveraged Learning Guide for Experts
4 pages
Make Sentences Comparing Landmarks. 5. Make Your Own Informal Dialog
50% (2)
Make Sentences Comparing Landmarks. 5. Make Your Own Informal Dialog
1 page
Unit Two Making Peace Unit Planning.
85% (75)
Unit Two Making Peace Unit Planning.
34 pages
Seven Principles of Breakthrough Thinking
100% (1)
Seven Principles of Breakthrough Thinking
5 pages
Schweidtmann 2024 Nature Chemical Engineering Viewpoint
No ratings yet
Schweidtmann 2024 Nature Chemical Engineering Viewpoint
1 page
Assignment 1 3 Film Summary John Dewey
No ratings yet
Assignment 1 3 Film Summary John Dewey
3 pages
LESSON 4 For LMS
No ratings yet
LESSON 4 For LMS
5 pages
Biological Psychology: Boris Bornemann, Bethany E. Kok, Anne Böckler, Tania Singer
No ratings yet
Biological Psychology: Boris Bornemann, Bethany E. Kok, Anne Böckler, Tania Singer
10 pages
IB Psychology - Paper 3 Revision Notes
86% (7)
IB Psychology - Paper 3 Revision Notes
5 pages
Chapter 4 Results and Discussion
100% (1)
Chapter 4 Results and Discussion
8 pages
Time Capsule
No ratings yet
Time Capsule
3 pages
IOP Conf. Series Journal of Physics Conf. Series
No ratings yet
IOP Conf. Series Journal of Physics Conf. Series
5 pages
Remembering Freud
0% (1)
Remembering Freud
4 pages
Platos Allegory of The Chariot
50% (2)
Platos Allegory of The Chariot
13 pages
Translations of Joseph Brodskys Poem May 24 1980
No ratings yet
Translations of Joseph Brodskys Poem May 24 1980
8 pages
2
No ratings yet
2
2 pages
Class Room Teaching ON Microteaching
33% (3)
Class Room Teaching ON Microteaching
22 pages
SARACEVIC, Tefko. Information Science
No ratings yet
SARACEVIC, Tefko. Information Science
17 pages
10 Effective Ways Intelligent People Deal With Rude People
No ratings yet
10 Effective Ways Intelligent People Deal With Rude People
7 pages
Evaluation of Physical Education, Sports and Recreation For The Gifted
No ratings yet
Evaluation of Physical Education, Sports and Recreation For The Gifted
5 pages
Grade X Unit-2 - (2025-26) Advanced Concepts of Modeling in AI
No ratings yet
Grade X Unit-2 - (2025-26) Advanced Concepts of Modeling in AI
84 pages
Module 3 Personal Development Plan
No ratings yet
Module 3 Personal Development Plan
6 pages
MCQ's : 'Introduction To Communication''
50% (4)
MCQ's : 'Introduction To Communication''
4 pages
Do Design Experiences in Engineering Build A "Growth Mindset" in Students?
No ratings yet
Do Design Experiences in Engineering Build A "Growth Mindset" in Students?
5 pages
The Cambridge Handbook of Intelligence and Cognitive Neuroscience 1stnbsped 1108480543 9781108480543 9781108727723 Compress
100% (2)
The Cambridge Handbook of Intelligence and Cognitive Neuroscience 1stnbsped 1108480543 9781108480543 9781108727723 Compress
515 pages
Ielts Writing Step 3.6 Answers
No ratings yet
Ielts Writing Step 3.6 Answers
2 pages
Presentation On Blue Prism
No ratings yet
Presentation On Blue Prism
9 pages
Aggettivi Di Personalità - Definizioni in Inglese
No ratings yet
Aggettivi Di Personalità - Definizioni in Inglese
2 pages
Unit 1 - Data Mining and Warehousing - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Mining and Warehousing - WWW - Rgpvnotes.in
16 pages
Effective Listening Skills Writing Center
No ratings yet
Effective Listening Skills Writing Center
6 pages

Lecture15 Parsing

Uploaded by

Lecture15 Parsing

Uploaded by

|| Jai Sri Gurudev||

Sri Adichunchanagiri Shikshana Trust (R)

SJB INSTITUTE OF TECHNOLOGY

Subject: Natural Language Processing(18CS743)

Department of Information Science & Engineering

You might also like