NLP
UNIT 3
SEMANTIC PARSING I
Semantic Parsing I: Introduction, Semantic Interpretation, System Paradigms, Word Sense
Semantic Interpretation
Explain in detail about Semantic Interpretation.
OR
How does the process of semantic interpretation contribute to understanding the
meaning of sentences?
Semantic Interpretation is the larger process of which semantic parsing is a part. It involves
a series of analytical steps to transform natural language text into a structured, machine-
readable representation. This representation is a prerequisite for any language understanding
system, as it allows a computer to perform further computational manipulations like search,
reasoning, or taking action.
The process of achieving a complete semantic interpretation can be broken down into several
key components, each addressing a different layer of meaning and ambiguity.
1. Structural Ambiguity
This component deals with the syntactic structure of sentences. Because the meaning of a
sentence is heavily dependent on its grammatical structure, syntactic parsing is
conventionally the first and foundational stage of semantic interpretation. It addresses
ambiguities that arise from the way words are grouped into phrases. For example, in "I saw
the man with the telescope," structural ambiguity determines whether "with the telescope"
modifies "saw" (I used a telescope to see him) or "man" (he was holding a telescope).
Resolving this structural ambiguity is essential before the meaning of the sentence can be
finalized. This stage transforms a linear string of words into an underlying syntactic
representation, such as a parse tree.
2. Word Sense
This component addresses lexical ambiguity, which is the fact that a single word can have
multiple meanings, or senses. The correct sense is usually determined by the surrounding
context.
• Example: The word nail can refer to a part of the human anatomy or a metallic
fastener. Humans can easily disambiguate its meaning in the following sentences:
1. He nailed the loose arm of the chair with a hammer. (Sense: fastener)
2. He went to the beauty salon to get his nails clipped. (Sense: anatomy)
The presence of context words like hammer and hardware store in the first case,
and clipped and manicure in the second, helps resolve the ambiguity. Resolving word sense is
a crucial step in understanding the intended meaning of individual words within a discourse.
3. Entity and Event Resolution
A set of entities (people, places, organizations) participating in various events. This
component focuses on identifying these entities and events and resolving how they are
referred to throughout a text. The same entity or event can be mentioned using different
words or phrases.
• Key Tasks:
o Named Entity Recognition (NER): Identifying and categorizing entities like
"Bell Atlantic Corp." as a company.
o Coreference Resolution: Recognizing that different phrases, such as "Bell
Atlantic Corp." and "it" in a subsequent sentence, refer to the same entity.
These two tasks fall under the umbrella of information extraction and are critical for creating
a coherent semantic representation of a discourse.
4. Predicate-Argument Structure
Once words, entities, and events are identified, this component determines the relationships
between them. It identifies the predicate (the main action or state, usually a verb) and
its arguments (the participants in that action). This process essentially answers the questions
of "who did what to whom, when, where, why, and how."
Figure: A representation of who did what to whom, when, where, why, and how
• Explanation: This diagram illustrates the predicate-argument structure for the
sentence: "Bell Atlantic Corp. said it will acquire one of Control Data Corp.'s computer
maintenance businesses."
o Event 1 (Said):
▪ Who: Bell Atlantic Corp.
▪ What: "it will acquire..."
o Event 2 (Acquire):
▪ Who: it (referring to Bell Atlantic Corp.)
▪ What: one of Control Data Corp.'s computer maintenance businesses
This structure makes the semantic roles of each entity explicit.
5. Meaning Representation
This is the final and ultimate goal of semantic interpretation: to build a formal,
structured meaning representation (also called a deep representation) that a computer can
manipulate. This representation encapsulates the full meaning of the text in a way that
supports tasks like answering questions or executing commands.
Because a universal, general-purpose meaning representation has not yet been achieved,
most work in this area is domain-specific.
• Example Representations:
1. RoboCup Domain:
▪ Sentence: "If our player 2 has the ball, then position our player 5 in
the midfield."
▪ Meaning Representation: ((bowner (player our 2)) (do (player our 5)
(pos (midfield))))
2. GeoQuery Domain:
▪ Sentence: "Which river is the longest?"
▪ Meaning Representation: answer(x₁, longest(x₁ river(x₁)))
These formal representations convert natural language queries or commands into a logical
form that a specific system can understand and act upon.
In conclusion, semantic interpretation is a multi-layered process that moves from syntactic
structure to lexical meaning, entity identification, role labeling, and finally to a formal
meaning representation, with each step building on the last to achieve a comprehensive
understanding of the text.
___________________________________________________________________________
System Paradigms
Explain System Paradigms.
The task of semantic interpretation—recovering meaning from text—has been approached
using various system paradigms. These paradigms define the fundamental architecture and
methodology used to build a system. They can be categorized along three primary
dimensions: System Architecture, Scope, and Coverage. Understanding these paradigms
provides a clear perspective on how different semantic interpretation systems are designed
and what their respective strengths and weaknesses are.
1. System Architectures
This dimension describes the core methodology used to build the system and how it
acquires its knowledge. There are four main types:
• (a) Knowledge-based: These systems rely on a predefined set of manually crafted
rules or a comprehensive knowledge base (like an ontology) to analyze text and derive
a solution. Their performance is entirely dependent on the quality and completeness
of this hand-coded knowledge. They do not learn from data.
• (b) Unsupervised: These systems require minimal human intervention. They typically
use existing, unannotated resources (like large text corpora) and statistical or
algorithmic methods to automatically discover patterns and structures. They can be
bootstrapped for a particular application or domain without needing labeled training
data.
• (c) Supervised: These systems are based on machine learning algorithms that learn
from manually annotated data. Researchers create a sufficient quantity of data where
the desired semantic phenomena are labeled. Feature functions are then designed to
project each problem instance into a feature space. A model is trained on this labeled
data to learn how to predict the correct labels for new, unseen data.
• (d) Semi-Supervised: These systems combine aspects of supervised and unsupervised
learning to overcome the high cost and data requirements of purely supervised
methods. Manual annotation is expensive and often yields insufficient data to capture
a phenomenon completely. Semi-supervised approaches address this by
automatically expanding a small, annotated dataset. This can be done by:
o Using a model's own machine-generated output on unlabeled data to create
more training examples.
o Bootstrapping from an existing model, where humans correct its output, which
is then added to the training set.
o Adapting a model from one domain to a new one, which is often faster than
building a new model from scratch.
2. Scope
This dimension describes the breadth of applicability of the system.
• (a) Domain Dependent: These systems are designed to be specific to a certain, narrow
domain. Their knowledge, rules, and meaning representations are tailored to a
particular task.
o Examples: Systems designed for air travel reservations, simulated football
coaching, or querying a geographic database.
o Limitation: They are not easily portable to other domains.
• (b) Domain Independent: These systems are designed to be general-purpose. The
techniques and representations they use are applicable across multiple domains with
little or no modification. This is the goal for more foundational NLP tasks.
3. Coverage
This dimension describes the depth or completeness of the semantic representation that
the system produces.
• (a) Shallow: These systems produce an intermediate representation of meaning. This
output is not the final, directly consumable result but is a structured representation
that another downstream component can use to base its actions on. For example, a
shallow system might identify predicate-argument structure, which is then converted
by another module into a database query.
• (b) Deep: These systems create a terminal representation of meaning. This output is
a complete, formal representation that can be directly consumed and executed by a
machine or application.
o Example: A deep semantic parser might convert the sentence "Which is the
longest river?" directly into the logical form answer(x₁, longest(x₁ river(x₁))),
which a query engine can execute immediately.
Conclusion
In summary, System Paradigms for semantic interpretation provide a framework for
classifying different approaches. A single system can be described using a combination of
these categories. For example, a system could be a supervised, domain-dependent,
deep semantic parser, meaning it is trained on labeled data, works only for a specific
application like RoboCup, and produces a final, executable meaning representation.
Conversely, another system might be an unsupervised, domain-independent, shallow parser
that discovers semantic roles across general text and produces an intermediate structure.
Understanding these paradigms is essential for evaluating the capabilities and limitations of
any given semantic interpretation system.
__________________________________________________________________________
Word Sense
Explain the importance of word sense disambiguation in semantic parsing.
Word Sense is a fundamental problem in computational semantics that deals with lexical
ambiguity. In any language, a single word (lemma) can have multiple meanings
or senses. Word Sense Disambiguation (WSD) is the task of identifying which specific sense
of a word is being used in a given context. The problem is challenging because the applicability
of WSD can be debated; for instance, in information retrieval, the presence of multiple query
words often provides enough implicit disambiguation. Nonetheless, for deep language
understanding, resolving word sense is a critical step.
Word sense ambiguities can be of three principal types:
1. Homonymy: Words that share the same spelling but have unrelated meanings
(e.g., bank as a financial institution vs. a river bank).
2. Polysemy: A single word with multiple, related senses (e.g., bank as a financial
institution vs. a bank of clouds).
3. Categorial Ambiguity: A word that can belong to different grammatical categories
(e.g., book as a noun vs. book as a verb).
1. Resources: The development of WSD systems heavily relies on the availability of lexical
resources and annotated corpora.
• Early Resources: Early work used machine-readable dictionaries like the Longman
Dictionary of Contemporary English (LDOCE) and thesauruses like Roget's Thesaurus.
• WordNet: A crucial resource, WordNet is a large lexical database of English where
words are grouped into sets of cognitive synonyms called synsets, each expressing a
distinct concept. It also provides a rich taxonomy of relationships like hypernymy (IS-
A) and meronymy (PART-OF).
• Annotated Corpora:
o SEMCOR: A portion of the Brown Corpus annotated with WordNet senses.
o DSO Corpus: A corpus tagging the most frequent and ambiguous nouns and
verbs in the Brown and Wall Street Journal corpora.
o SENSEVAL/SEMEVAL: A series of evaluation exercises that have produced
many standardized datasets for WSD.
o OntoNotes: The largest sense-tagged corpus to date, covering a significant
number of verbs and nouns across multiple genres with coarse-grained senses.
• Other Knowledge Bases: Resources like Cyc (a common sense knowledge base)
and HowNet (for Chinese) also help address the knowledge bottleneck in WSD.
2. Systems : WSD systems can be classified into four main paradigms: rule-based, supervised,
unsupervised, and semi-supervised.
1. Rule-Based Systems
These first-generation systems primarily rely on dictionary definitions (glosses) and
handcrafted rules.
• The Simplified Lesk Algorithm: This algorithm disambiguates a word by finding the
sense whose dictionary gloss has the greatest overlap with the words in the
surrounding context.
Algorithm : Pseudocode of the simplified Lesk algorithm
Procedure: SIMPLIFIED_LESK(word, sentence) returns best sense of word
1: best-sense ← most frequent sense of word
2: max-overlap ← 0
3: context ← set of words in sentence
4: for all sense ∈ senses of word do
5: signature ← set of words in gloss and examples of sense
6: overlap ← COMPUTEOVERLAP(signature, context)
7: if overlap > max-overlap then
8: max-overlap ← overlap
9: best-sense ← sense
10: end if
11: end for
12: return best-sense
• Yarowsky's Thesaurus-based Algorithm: This method classifies words into broad
topic categories (from Roget's Thesaurus) based on statistical analysis of context.
Algorithm for disambiguating words into Roget's Thesaurus categories
Explanation: The algorithm involves three steps: (1) Collect contexts for each category, (2)
weight the salient words in the context using the probability of the word occurring with that
category, and (3) assign a word to the category that maximizes the overall log-probability
score.
• Structural Semantic Interconnections (SSI) Algorithm: This is a more advanced
knowledge-based algorithm that uses semantic graphs built from WordNet and other
resources. It iteratively disambiguates words in a context by finding the sense that has
the strongest semantic interconnections with the senses of other words in the
context.
Figure 4-3: The graphs for sense 1 and 2 of the noun bus as generated by the SSI
algorithm.
Explanation: The figure shows two semantic graphs for the noun "bus". Sense 1 (the vehicle)
is connected to concepts like "traveler," "transport," and "school bus." Sense 2 (the
connector) is connected to "electricity," "computer," and "circuit." The algorithm would
choose the sense whose graph best matches the semantic context of the sentence.
2. Supervised Systems
These systems use machine learning classifiers trained on manually sense-tagged data. They
tend to perform the best but require expensive annotation.
• Classifiers: Support Vector Machines (SVMs) and Maximum Entropy (MaxEnt) are
common choices.
• Features: A wide range of features are extracted from the context of the target
word:
o Lexical and POS context: Surrounding words, lemmas, and their parts-of-
speech.
o Local collocations: Ordered sequences of words or POS tags near the target
word (e.g., C-1,-1 is the word to the left).
o Syntactic Relations: If a parse tree is available, syntactic features can be
used.
Algorithm : Rules for selecting syntactic relations as features
Explanation: This algorithm defines a set of rules for extracting features from a parse tree.
For example, if the target word (w) is a noun, it selects its parent head word (h), the POS
of h, the voice of h, and the position of h. These features provide rich structural context.
3. Unsupervised Systems
These methods operate without labeled training data, often by clustering word instances or
using distance metrics in a semantic space.
• Conceptual Density: This method uses a hierarchical semantic network like WordNet.
It disambiguates a word by choosing the sense that belongs to the subhierarchy with
the highest "conceptual density" of context words.
Figure 4-4: Conceptual density
o Explanation: To disambiguate word W, the algorithm looks at its four
possible senses. Sense 2 is chosen because its subhierarchy in WordNet
contains the highest concentration of context words (w1, w2, w3, w4).
• Crosslinguistic Evidence: These algorithms use evidence from other languages.
The SALAAM algorithm, for example, uses a word-aligned parallel corpus to create
sense-tagged data.
SALAAM algorithm for creating training using parallel English-to-Arabic machine
translations
Explanation: The algorithm groups English (L1) words that translate to the same Arabic (L2)
word. It then uses WordNet to find the most appropriate sense for that cluster and
propagates the sense tags back to the English words and their Arabic translations, thereby
creating a sense-tagged corpus.
4. Semisupervised Systems
These systems start with a small number of labeled "seed" examples and iteratively expand
this set in a process called bootstrapping.
• The Yarowsky Algorithm: This is the classic semi-supervised WSD algorithm. It is
based on two powerful assumptions:
1. One sense per collocation: Nearby words provide strong clues to a word's
sense.
2. One sense per discourse: A word is likely to maintain the same sense
throughout a document.
Figure 4-6: The three stages of the Yarowsky algorithm
Explanation: This figure illustrates the iterative process. The first box shows the initial state
with a few labeled seed examples (A and B). The second box shows the state after one
iteration, where more instances have been classified based on the initial seeds. The final box
shows the end of a cycle, with only a small residual of ambiguous cases remaining.
The Yarowsky algorithm
Explanation: The algorithm (Step 1-5) starts by identifying all instances of a polysemous word.
It uses a small set of seed examples (Step 2) to train a classifier (Step 3a). This classifier labels
the remaining instances (Step 3b), and high-confidence examples are added to the training
set. The "one sense per discourse" constraint helps filter errors, and new collocations are
learned (Step 3c). This process repeats until the set of unlabeled instances stops shrinking
(Step 4), resulting in a powerful final classifier (Step 5).
3.Software: Several software programs are made available by the research community for
word sense disambiguation, ranging from similarity measure modules to full disambiguation
systems, which are shown in the below diagram:
******