CYK Algorithm
The CYK algorithm (Cocke–Younger–Kasami algorithm) is a parsing algorithm used in Natural
Language Processing (NLP) to determine whether a given string (sentence) can be generated
by a context-free grammar (CFG) in Chomsky Normal Form (CNF). It is a bottom-up parsing
algorithm and is widely used for syntactic parsing.
Key Concepts:
• Context-Free Grammar (CFG): A grammar where each production rule has a single
non-terminal on the left-hand side.
• Chomsky Normal Form (CNF): A restricted form of CFG where every production is
either:
o A → BC (two non-terminals)
o A → a (a terminal)
Use of CYK in NLP:
• Parsing sentences to check grammatical correctness
• Building parse trees
• Used in syntax checking and machine translation
How the CYK Algorithm Works:
Given:
• A string of length n: w = w₁ w₂ w₃ ... wₙ
• A CFG in CNF
Step-by-Step Process:
1. Initialize a table T[n][n], where each cell T[i][j] holds the set of non-terminals that can
generate the substring w[i...j].
2. Base Case (Length = 1): For each position i, find the non-terminals that produce the
terminal w[i].
3. Recursive Step (Length > 1): For each substring length l = 2 to n, and each starting
position i, compute the possible non-terminals for substring w[i...i+l-1] by:
o Splitting into two parts: w[i...k] and w[k+1...i+l-1]
o If A → BC and B ∈ T[i][k], C ∈ T[k+1][i+l-1], then add A to T[i][i+l-1]
4. Check if the start symbol S is in T[0][n-1] (i.e., the full string):
o If yes → the string is generated by the grammar
o If no → the string is not in the language
Time and Space Complexity:
• Time Complexity: O(n3⋅∣G∣)O(n^3 \cdot |G|)
• Space Complexity: O(n2)O(n^2)
o Where nn is the length of the input string, and ∣G∣|G| is the number of production
rules in the grammar.
Example:
Grammar in CNF:
S → AB | BC
A → BA | a
B → CC | b
C → AB | a
Input string: "baaa"
Apply the CYK algorithm to check if "baaa" belongs to the language.
Applications in NLP:
• Syntax checking in compilers and NLP
• Parsing natural language queries
• Speech recognition systems
• Machine translation (for grammatical correctness)
What are Tree-Based Language Models?
Tree-based language models are statistical or neural language models that incorporate
syntactic structure (trees) of sentences instead of treating them as flat sequences of words.
These models use parse trees or syntax trees to capture hierarchical relationships in natural
language.
Tree-based language models aim to improve upon traditional n-gram or sequential neural
models (like RNNs or LSTMs) by explicitly modeling the grammatical structure of a sentence
using constituency trees or dependency trees.
Types of Tree-Based Language Models:
1. Probabilistic Context-Free Grammar (PCFG) Based Models
• What it is: An extension of context-free grammar (CFG) with probabilities attached to
production rules.
• How it works: Each rule (e.g., NP → DT NN) has a probability based on frequency.
• Probability of a sentence: Product of probabilities of rules used in the parse tree.
Used in: Statistical parsing, early tree-based language modeling.
2. Tree Adjoining Grammar (TAG) Based Models
• What it is: A more expressive grammar formalism than CFG.
• Advantage: Captures long-distance dependencies better.
• Language modeling: Probabilities assigned to derivations based on elementary trees.
Used in: Parsing and modeling of languages with complex syntax (like German, Hindi).
3. Syntactic Tree-LSTM (Tree-Structured LSTM)
• Introduced by: Kai Sheng Tai, Richard Socher, and Christopher Manning (2015).
• How it works: Instead of sequentially passing information (like in RNN), it passes
information from child nodes to parent node in a parse tree.
• Input: Parse tree (constituency or dependency) + word embeddings
• Output: Sentence representation or probability of next word
Used in:
• Sentiment analysis
• Syntax-aware language modeling
• Machine translation
4. Recursive Neural Networks (RecNNs)
• Structure: A neural model that recursively combines child node vectors to form
parent node vectors.
• Parse Tree Usage: Applies the same function at each node of a syntax tree.
• Limitation: Shallow structure and hard to train; replaced in many areas by Tree-LSTMs.
Used in:
• Sentence similarity
• Syntax-aware classification
5. Dependency-Based Language Models
• Focus: Dependency parse trees, where each word is connected to others through
grammatical relationships (e.g., subject, object).
• Model: Predicts words using their syntactic dependents rather than left-to-right
sequence.
• Example: Eisner’s Dependency Model (1996), Structured Language Models by Chelba &
Jelinek (1998)
Benefits:
• Captures syntactic structure explicitly
• Works well for languages with free word order
6. Tree Transformers
• What it is: Transformer models that integrate syntactic trees (parse trees) into attention
mechanisms.
• How:
o Bias attention heads based on syntactic distances
o Restrict self-attention using tree structures
Examples:
• Syntax-Aware Transformers
• TreeFormer
• StructBERT (uses parse trees for pretraining)
Why Use Tree-Based Language Models?
Tree-Based Sequence Models (e.g., RNN,
Feature
Models Transformer)
Captures syntax explicitly Yes No (implicitly, if at all)
Handles long-range dependencies Better Sometimes (especially in LSTMs)
Suitable for free-word order
Yes Often struggles
languages
Summary:
Model Type Structure Used Strength
PCFG Constituency Tree Probabilistic grammar rules
TAG Extended parse trees Long-distance dependency modeling
Tree-LSTM Any parse tree Hierarchical neural computation
RecNN Constituency Tree Recursive vector combination
Dependency Models Dependency Tree Head-modifier syntax modeling
Tree Transformers Parse Trees + Attention Syntax-aware deep models