0% found this document useful (0 votes)

34 views5 pages

CYK Algorithm & Tree Based Language Models

The CYK algorithm is a bottom-up parsing method used in Natural Language Processing to determine if a string can be generated by a context-free grammar in Chomsky Normal Form. It involves initializing a table to track non-terminals for substrings and checking if the start symbol can generate the full string. Tree-based language models enhance traditional models by incorporating syntactic structures, improving the handling of grammatical relationships and long-range dependencies.

Uploaded by

ankithmahareddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views5 pages

CYK Algorithm & Tree Based Language Models

Uploaded by

ankithmahareddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CYK Algorithm

The CYK algorithm (Cocke–Younger–Kasami algorithm) is a parsing algorithm used in Natural

Language Processing (NLP) to determine whether a given string (sentence) can be generated
by a context-free grammar (CFG) in Chomsky Normal Form (CNF). It is a bottom-up parsing
algorithm and is widely used for syntactic parsing.

Key Concepts:

• Context-Free Grammar (CFG): A grammar where each production rule has a single
non-terminal on the left-hand side.

• Chomsky Normal Form (CNF): A restricted form of CFG where every production is
either:

o A → BC (two non-terminals)

o A → a (a terminal)

Use of CYK in NLP:

• Parsing sentences to check grammatical correctness

• Building parse trees

• Used in syntax checking and machine translation

How the CYK Algorithm Works:

Given:

• A string of length n: w = w₁ w₂ w₃ ... wₙ

• A CFG in CNF

Step-by-Step Process:

1. Initialize a table T[n][n], where each cell T[i][j] holds the set of non-terminals that can
generate the substring w[i...j].

2. Base Case (Length = 1): For each position i, find the non-terminals that produce the
terminal w[i].

3. Recursive Step (Length > 1): For each substring length l = 2 to n, and each starting
position i, compute the possible non-terminals for substring w[i...i+l-1] by:

o Splitting into two parts: w[i...k] and w[k+1...i+l-1]

o If A → BC and B ∈ T[i][k], C ∈ T[k+1][i+l-1], then add A to T[i][i+l-1]

4. Check if the start symbol S is in T[0][n-1] (i.e., the full string):

o If yes → the string is generated by the grammar

o If no → the string is not in the language

Time and Space Complexity:

• Time Complexity: O(n3⋅∣G∣)O(n^3 \cdot |G|)

• Space Complexity: O(n2)O(n^2)

o Where nn is the length of the input string, and ∣G∣|G| is the number of production
rules in the grammar.

Example:

Grammar in CNF:

S → AB | BC

A → BA | a

B → CC | b

C → AB | a

Input string: "baaa"

Apply the CYK algorithm to check if "baaa" belongs to the language.

Applications in NLP:

• Syntax checking in compilers and NLP

• Parsing natural language queries

• Speech recognition systems

• Machine translation (for grammatical correctness)

What are Tree-Based Language Models?
Tree-based language models are statistical or neural language models that incorporate
syntactic structure (trees) of sentences instead of treating them as flat sequences of words.
These models use parse trees or syntax trees to capture hierarchical relationships in natural
language.

Tree-based language models aim to improve upon traditional n-gram or sequential neural
models (like RNNs or LSTMs) by explicitly modeling the grammatical structure of a sentence
using constituency trees or dependency trees.

Types of Tree-Based Language Models:

1. Probabilistic Context-Free Grammar (PCFG) Based Models

• What it is: An extension of context-free grammar (CFG) with probabilities attached to

production rules.

• How it works: Each rule (e.g., NP → DT NN) has a probability based on frequency.

• Probability of a sentence: Product of probabilities of rules used in the parse tree.

Used in: Statistical parsing, early tree-based language modeling.

2. Tree Adjoining Grammar (TAG) Based Models

• What it is: A more expressive grammar formalism than CFG.

• Advantage: Captures long-distance dependencies better.

• Language modeling: Probabilities assigned to derivations based on elementary trees.

Used in: Parsing and modeling of languages with complex syntax (like German, Hindi).

3. Syntactic Tree-LSTM (Tree-Structured LSTM)

• Introduced by: Kai Sheng Tai, Richard Socher, and Christopher Manning (2015).

• How it works: Instead of sequentially passing information (like in RNN), it passes

information from child nodes to parent node in a parse tree.

• Input: Parse tree (constituency or dependency) + word embeddings

• Output: Sentence representation or probability of next word

Used in:

• Sentiment analysis

• Syntax-aware language modeling

• Machine translation

4. Recursive Neural Networks (RecNNs)

• Structure: A neural model that recursively combines child node vectors to form
parent node vectors.

• Parse Tree Usage: Applies the same function at each node of a syntax tree.

• Limitation: Shallow structure and hard to train; replaced in many areas by Tree-LSTMs.

Used in:

• Sentence similarity

• Syntax-aware classification

5. Dependency-Based Language Models

• Focus: Dependency parse trees, where each word is connected to others through
grammatical relationships (e.g., subject, object).

• Model: Predicts words using their syntactic dependents rather than left-to-right
sequence.

• Example: Eisner’s Dependency Model (1996), Structured Language Models by Chelba &
Jelinek (1998)

Benefits:

• Captures syntactic structure explicitly

• Works well for languages with free word order

6. Tree Transformers

• What it is: Transformer models that integrate syntactic trees (parse trees) into attention
mechanisms.

• How:

o Bias attention heads based on syntactic distances

o Restrict self-attention using tree structures

Examples:

• Syntax-Aware Transformers

• TreeFormer

• StructBERT (uses parse trees for pretraining)

Why Use Tree-Based Language Models?

Tree-Based Sequence Models (e.g., RNN,

Feature
Models Transformer)

Captures syntax explicitly Yes No (implicitly, if at all)

Handles long-range dependencies Better Sometimes (especially in LSTMs)

Suitable for free-word order

Yes Often struggles
languages

Summary:

Model Type Structure Used Strength

PCFG Constituency Tree Probabilistic grammar rules

TAG Extended parse trees Long-distance dependency modeling

Tree-LSTM Any parse tree Hierarchical neural computation

RecNN Constituency Tree Recursive vector combination

Dependency Models Dependency Tree Head-modifier syntax modeling

Tree Transformers Parse Trees + Attention Syntax-aware deep models

Constituency Parsing PPT 2
No ratings yet
Constituency Parsing PPT 2
33 pages
Advanced NLP: CFG Parsing Guide
No ratings yet
Advanced NLP: CFG Parsing Guide
28 pages
Important Notes How Have Facing Problem in NLP
No ratings yet
Important Notes How Have Facing Problem in NLP
6 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
71 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
Parsing Techniques for NLP Students
No ratings yet
Parsing Techniques for NLP Students
60 pages
NLPPR6
No ratings yet
NLPPR6
6 pages
Dependency Parsing
100% (11)
Dependency Parsing
127 pages
Natural Language Processing UNIT 2
No ratings yet
Natural Language Processing UNIT 2
32 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
NLP Sem 3 Unit
No ratings yet
NLP Sem 3 Unit
12 pages
NLP Basics: Key Concepts and Processes
No ratings yet
NLP Basics: Key Concepts and Processes
15 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
NLP - Shortnotes Unit 3
No ratings yet
NLP - Shortnotes Unit 3
16 pages
Automated Methods For The Comparison of Natural La
No ratings yet
Automated Methods For The Comparison of Natural La
11 pages
Thuật toán NLP
No ratings yet
Thuật toán NLP
57 pages
Parsing and Ambiguity in NLP
No ratings yet
Parsing and Ambiguity in NLP
18 pages
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
No ratings yet
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
32 pages
NLP - Lecture1
No ratings yet
NLP - Lecture1
21 pages
Unit 3
No ratings yet
Unit 3
19 pages
NLP Unit-2
No ratings yet
NLP Unit-2
18 pages
SPand MT
No ratings yet
SPand MT
93 pages
Unit 2 New One
No ratings yet
Unit 2 New One
12 pages
NLP 3
No ratings yet
NLP 3
4 pages
Reference Material NLP - 2
No ratings yet
Reference Material NLP - 2
40 pages
4.chapter5 - Syntactic and Semantic Representations
No ratings yet
4.chapter5 - Syntactic and Semantic Representations
47 pages
NLP M3 SPP
No ratings yet
NLP M3 SPP
53 pages
NLP Unit 2
No ratings yet
NLP Unit 2
13 pages
NLP Unit3 Syntactic Analysis Elaborated
No ratings yet
NLP Unit3 Syntactic Analysis Elaborated
4 pages
Sem:U: Btecht in
No ratings yet
Sem:U: Btecht in
5 pages
Cs383 Lecture16 PDF
No ratings yet
Cs383 Lecture16 PDF
46 pages
Module-2 ch-4
No ratings yet
Module-2 ch-4
32 pages
NLP Unit 3 - Student
No ratings yet
NLP Unit 3 - Student
26 pages
2nd Phase Syntax Analyzer - 1
No ratings yet
2nd Phase Syntax Analyzer - 1
136 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Lecture15 Parsing
No ratings yet
Lecture15 Parsing
37 pages
Compiler 2
100% (1)
Compiler 2
45 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
23 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
45 pages
NLP Mid-1
No ratings yet
NLP Mid-1
15 pages
Parsing
No ratings yet
Parsing
27 pages
TOC MT2 U3 Big Quest Key
No ratings yet
TOC MT2 U3 Big Quest Key
11 pages
Lec15 CL1-f11
No ratings yet
Lec15 CL1-f11
5 pages
NLP Unit-3
No ratings yet
NLP Unit-3
14 pages
BCS 324 Compiler Design Notes - Unit2
No ratings yet
BCS 324 Compiler Design Notes - Unit2
37 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
Parsing Bun
No ratings yet
Parsing Bun
48 pages
What Is Parsing
No ratings yet
What Is Parsing
47 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
Challenges in NLP
No ratings yet
Challenges in NLP
9 pages
2019-11-29 04 41 39CS V Sem Compiler Design
No ratings yet
2019-11-29 04 41 39CS V Sem Compiler Design
10 pages
BAI601 All Modules VTU 10 Mark Complete
No ratings yet
BAI601 All Modules VTU 10 Mark Complete
18 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
Compiler 2
No ratings yet
Compiler 2
45 pages
Pumping Lemma (Bar Hillel Lemma)
No ratings yet
Pumping Lemma (Bar Hillel Lemma)
49 pages
Ch2 Modified
No ratings yet
Ch2 Modified
39 pages
Study MAterial Unit 2
No ratings yet
Study MAterial Unit 2
16 pages
RWL Dyslexia Reading Disabilities
No ratings yet
RWL Dyslexia Reading Disabilities
11 pages
ASB FastNet Statements
No ratings yet
ASB FastNet Statements
1 page
Avaya PD-PDS Integration
No ratings yet
Avaya PD-PDS Integration
119 pages
Nepali Class 11211.
No ratings yet
Nepali Class 11211.
163 pages
Crochet Unicorn
100% (1)
Crochet Unicorn
21 pages
Introduction To Cost Accounting
No ratings yet
Introduction To Cost Accounting
21 pages
Christ Life of The Soul - Dom Marmion
100% (1)
Christ Life of The Soul - Dom Marmion
521 pages
Angry Birds Projectile Motion Analysis
100% (1)
Angry Birds Projectile Motion Analysis
5 pages
1 Null Alternative Hypothesis SPTC 1301 Q4 FPF
No ratings yet
1 Null Alternative Hypothesis SPTC 1301 Q4 FPF
36 pages
Grade 3 Simple Complex Sentences A
No ratings yet
Grade 3 Simple Complex Sentences A
2 pages
Management Sciences Prospectus 2024
No ratings yet
Management Sciences Prospectus 2024
193 pages
BAC306 - 03 Advanced Financial Accounting and Reporting
No ratings yet
BAC306 - 03 Advanced Financial Accounting and Reporting
2 pages
CCProject Phase One
No ratings yet
CCProject Phase One
2 pages
Chapter 4 The Internet and The Tourist 2
No ratings yet
Chapter 4 The Internet and The Tourist 2
41 pages
Resource Based Theory (RBT) - AK S1
No ratings yet
Resource Based Theory (RBT) - AK S1
9 pages
Session Oral Presentation
No ratings yet
Session Oral Presentation
2 pages
Windows Server Proposal For Fixing Windows LLC
No ratings yet
Windows Server Proposal For Fixing Windows LLC
8 pages
Math 5 - Answer Key
No ratings yet
Math 5 - Answer Key
48 pages
Chapter 1
No ratings yet
Chapter 1
32 pages
Esd-05-15 (Fraucsher Axle Counter)
100% (1)
Esd-05-15 (Fraucsher Axle Counter)
26 pages
Convert Dissertation To Article
100% (2)
Convert Dissertation To Article
8 pages
Guidlines For Nach Documentation
No ratings yet
Guidlines For Nach Documentation
16 pages
Copa Lesson Plan Java
No ratings yet
Copa Lesson Plan Java
118 pages
Luck Argument To Libertarianism
No ratings yet
Luck Argument To Libertarianism
10 pages
Power Factor Correction and Quasi-Resonant DC/DC Converter IC
No ratings yet
Power Factor Correction and Quasi-Resonant DC/DC Converter IC
36 pages
Wind Energy
100% (1)
Wind Energy
1 page
ABC of Salvation
No ratings yet
ABC of Salvation
3 pages
Comprehensive Rotordynamics Textbook List
No ratings yet
Comprehensive Rotordynamics Textbook List
3 pages
Bullying in Schools Thesis Statement
100% (3)
Bullying in Schools Thesis Statement
5 pages
15 SearchTrees
No ratings yet
15 SearchTrees
67 pages

CYK Algorithm & Tree Based Language Models

Uploaded by

CYK Algorithm & Tree Based Language Models

Uploaded by

CYK Algorithm

The CYK algorithm (Cocke–Younger–Kasami algorithm) is a parsing algorithm used in Natural

Use of CYK in NLP:

• Parsing sentences to check grammatical correctness

• Building parse trees

• Used in syntax checking and machine translation

How the CYK Algorithm Works:

• A string of length n: w = w₁ w₂ w₃ ... wₙ

o Splitting into two parts: w[i...k] and w[k+1...i+l-1]

o If A → BC and B ∈ T[i][k], C ∈ T[k+1][i+l-1], then add A to T[i][i+l-1]

4. Check if the start symbol S is in T[0][n-1] (i.e., the full string):

o If no → the string is not in the language

Time and Space Complexity:

• Time Complexity: O(n3⋅∣G∣)O(n^3 \cdot |G|)

• Space Complexity: O(n2)O(n^2)

Input string: "baaa"

Apply the CYK algorithm to check if "baaa" belongs to the language.

• Syntax checking in compilers and NLP

• Parsing natural language queries

• Speech recognition systems

• Machine translation (for grammatical correctness)

Types of Tree-Based Language Models:

1. Probabilistic Context-Free Grammar (PCFG) Based Models

• What it is: An extension of context-free grammar (CFG) with probabilities attached to

• Probability of a sentence: Product of probabilities of rules used in the parse tree.

Used in: Statistical parsing, early tree-based language modeling.

2. Tree Adjoining Grammar (TAG) Based Models

• What it is: A more expressive grammar formalism than CFG.

• Advantage: Captures long-distance dependencies better.

• Language modeling: Probabilities assigned to derivations based on elementary trees.

3. Syntactic Tree-LSTM (Tree-Structured LSTM)

• How it works: Instead of sequentially passing information (like in RNN), it passes

• Input: Parse tree (constituency or dependency) + word embeddings

• Output: Sentence representation or probability of next word

• Syntax-aware language modeling

4. Recursive Neural Networks (RecNNs)

5. Dependency-Based Language Models

• Captures syntactic structure explicitly

• Works well for languages with free word order

o Bias attention heads based on syntactic distances

o Restrict self-attention using tree structures

• StructBERT (uses parse trees for pretraining)

Tree-Based Sequence Models (e.g., RNN,

Captures syntax explicitly Yes No (implicitly, if at all)

Handles long-range dependencies Better Sometimes (especially in LSTMs)

Suitable for free-word order

Model Type Structure Used Strength

PCFG Constituency Tree Probabilistic grammar rules

TAG Extended parse trees Long-distance dependency modeling

Tree-LSTM Any parse tree Hierarchical neural computation

RecNN Constituency Tree Recursive vector combination

Dependency Models Dependency Tree Head-modifier syntax modeling

Tree Transformers Parse Trees + Attention Syntax-aware deep models

You might also like