Unit-III
Semantic Parsing
By
Dr. Akansha Tyagi
Assistant Professor
Department of Computer Science and Engineering
BVRIT HYDERABAD College of Engineering for Women
Semantic Parsing
• Semantic parsing is the task of converting a natural language utterance to a logical
form: a machine-understandable representation of its meaning.
• Semantic parsing can thus be understood as extracting the precise meaning of an
utterance.
• Applications of semantic parsing include
• Machine Translation
• Question Answering
• Ontology Induction
• Automated Reasoning
• Code Generation
BVRIT HYDERABAD College of Engineering for Women
Semantic Parsing
• Semantic parsing is our term for translating natural language statements into some
executable meaning representation.
• Semantic parsers form the backbone of voice assistants, as shown above, or they
can be used to answer questions or give natural language interfaces to databases
BVRIT HYDERABAD College of Engineering for Women
Types of Semantic Parsing
Shallow Semantic Parsing:
• Shallow Semantic Parsing is concerned with identifying entities in an utterance
and labelling them with the roles they play.
• Shallow semantic parsing is sometimes known as slot-filling or frame semantic
parsing.
• Slot-filling systems are widely used in virtual assistants in conjunction with intent
classifiers, which can be seen as mechanisms for identifying the frame evoked by
an utterance.
• Popular architectures for slot-filling are largely variants of an encoder-decoder
model, wherein two recurrent neural networks (RNNs) are trained jointly to encode
an utterance into a vector and to decode that vector into a sequence of slot labels.
• This type of model is used in the Amazon Alexa spoken language understanding
system.
BVRIT HYDERABAD College of Engineering for Women
Types of Semantic Parsing
Deep Semantic Parsing:
• Deep Semantic Parsing, also known as compositional semantic parsing, is
concerned with producing precise meaning representations of utterances that can
contain significant compositionality.
• Shallow semantic parsers can parse utterances like "show me flights from Boston to
Dallas" by classifying the intent as "list flights", and filling slots "source" and
"destination" with "Boston" and "Dallas", respectively.
• However, shallow semantic parsing cannot parse arbitrary compositional utterances,
like "show me flights from Boston to anywhere that has flights to Juneau".
• Deep semantic parsing attempts to parse such utterances, typically by converting
them to a formal meaning representation language.
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
Semantic Interpretation involves various components that together let us define a
representation of text that can be fed into a computer to allow further computational
manipulations.
The Semantic theory should be able to :
• Explaining sentences having ambiguous Meaning ( Bill is a bird or Electricity Bill)
• Resolve ambiguities of words in context(The bill is large but need not be paid)
• Identify meaningless but syntactically well formed sentences like “colorless green
ideas sleep furiously”.
• Identify syntactically unrelated paraphrases of a concept having the same semantic
content.
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
In the following subsections we look at some requirements for achieving a semantic
representation:
• Structural Ambiguity
• Word Sense
• Entity and Event Resolution
• Predicate-Argument Structure
• Meaning Representation
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
Structural Ambiguity:
• Structure refers to syntactic structure of sentences, it means transforming a sentence
into its underlying syntactic representation.
• Syntax and Semantics have strong interaction.
• Syntax has become the first stage of processing followed by various other stages in
the process of Semantic interpretation.
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
Word Sense:
• In any language, there is a case that the same word type or word lemma, is used in
different contexts to represent different entities in the world.
• For example, we use the word “nail” to represent the part of human anatomy and also to
represent the metallic object.
• Humans identify through context, in which sense the word is intended by the speaker.
• Consider the sentences :
1. He nailed the loose arm of the chair with a hammer.
2. He brought a box of nails from the hardware store.
3. He went to a beauty salon to get his nails clipped.
4. He went to get a manicure. His nails had grown very long.
• The presence of words such as hammer and hardware in sentence 1&2, and of clipped and
manicure in sentence 3&4 enable humans to easily disambiguate the sense in which nail is
used.
• Resolving the sense of words is one of the steps in the process of Semantic Interpretation.
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
Entity and Event Resolution:
• Here identification of various entities that are sprinkled across the discourse using
the same or different phrases.
• Reconciling what type of entity or event is considered along with disambiguating
various ways in which the same entity is referred to over a discourse, is critical to
creating a semantic representation.
• Two predominant tasks have become popular over the years :
• Named Entity Recognition
• Conference Resolution
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
Predicate Argument Structure:
• Since semantic relations are distinct from syntactic ones, we use a special means of
expressing semantic relations called Predicate Argument Structure.
• Predicate argument structure is based on the function features of lexical items (most
often verbs).
• The function features determine the thematic roles(Thematic roles refer to
participants of these events) to be played by the other words in the sentence.
• A Predicate specifies a relationship between objects (concrete or abstract) or a state
that characterizes an object, e.g. [BIT(BOY, DOG)] 'the boy bit the dog’.
• A predicate may also be a property: [BIG{DOG}] 'the dog is big/a big dog'.
• Arguments refer to real-world objects about which something is predicated.
• The Predicate Argument Structure identifies which entities play what part in the
event.
• It can be defined as who did what to whom, when, where, why and how.
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
Predicate Argument Structure:
• Understanding a sentence = knowing who did what (to whom, when, where,
why…)
• Verbs corresponds to predicates (what was done)
• Their arguments (and modifiers) identify who did it, to whom, where, when, why,
etc.)
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
Predicate Argument Structure:
Lexical Entry Semantic Example Paraphrase
RUN(X) = RUN(MARY) = Mary runs/ran/is running
RED(X) = RED(CAR) = The car is/was/will be red
MAKE(XY) = MADE(JOHN, TABLE) = John made the table
IN(XY) = IN(SUE, HOUSE) = Sue is in the house
MOTHER(X) = MOTHER(SUE) = Sue's mother/the mother of S
BVRIT HYDERABAD College of Engineering for Women
Semantic Interpretation
Meaning Representation:
• Capturing the meaning of linguistic utterances using formal notation.
• This is the final process of the semantic interpretation which builds a meaning
representation that can then be manipulated by algorithms to various applications
ends. This process is called deep representation.
• To be computationally effective, we expect certain properties in meaning
representations:
• Verifiability -- Ability to determine the truth value of the representation.
• Unambiguous Representations -- A representation must be unambiguous.
• Canonical Form -- Utterances which means the same thing should map to the
same meaning representation.
• Inference and Variables -- Ability to draw valid conclusions based on the
meaning representations of inputs and the background knowledge.
• Expressiveness -- Ability to express wide range of subject matter.
BVRIT HYDERABAD College of Engineering for Women
System Paradigms
System Paradigms fall in 3 categories
• System Architectures
• Scope
• Coverage
BVRIT HYDERABAD College of Engineering for Women
System Paradigms
System Architectures
1. Knowledge Based:
These systems use predefined set of rules or a knowledge base to obtain solution to a
new problem.
2. Unsupervised:
These systems tend to require minimal human intervention to be functional by using
existing resources that can be bootstrapped for a particular application or problem
domain.
3. Supervised:
• These systems involve manual annotations of some phenomena that appear in a
sufficient amount of data so that machine learning algorithms can be applied.
• A model is trained based on some features to predict labels, and then it is applied on
unseen data.
BVRIT HYDERABAD College of Engineering for Women
System Paradigms
4. Semi-supervised:
• Manual annotation is usually very expensive and does not yield enough data to
completely capture a phenomenon.
• In such instances, researchers can automatically expand the dataset on which their
models are trained either by employing machine-generated output directly or by
bootstrapping off of an existing model by having humans correct its output.
• In many cases, model of one domain quickly adapt to new domain.
BVRIT HYDERABAD College of Engineering for Women
System Paradigms
Scope
1. Domain Dependent:
These systems are specific to certain domains, such as air travel reservations or
simulated football coaching.
2. Domain Independent:
These systems are general enough that techniques can be applicable to multiple
domains without little or no change.
BVRIT HYDERABAD College of Engineering for Women
System Paradigms
Coverage:
1. Shallow:
These systems tend to produce an intermediate representation that can be converted to
one that a machine can base its actions on.
2. Deep:
These systems usually create a terminal representation that is directly consumed by the
machine or application.
BVRIT HYDERABAD College of Engineering for Women
Word Sense
• A sense (or word sense) is a discrete representation of one aspect of the meaning of a
word.
• The word sense is polysemous i.e. it has more than one sense.
• Many, if not most words, in any given language will be polysemous.
• We represent each sense with a superscript which is easy to see the different
meanings:
BVRIT HYDERABAD College of Engineering for Women
Word Sense
Relationship between Senses:
1. Synonymy
A word or phrase that means exactly or nearly the same as another word or phrase in the
same language, for example shut is a synonym of close.
When two senses of two different words (lemmas) are identical, or nearly identical, we
say the two senses are synonyms.
Synonyms include such pairs as
couch/sofa vomit/throw up filbert/hazelnut car/automobile
BVRIT HYDERABAD College of Engineering for Women
Word Sense
Synonymy is actually a relationship between senses rather than words.
Consider the words big and large:
We could swap big and large in the following sentences which seem to be synonyms and
retain the same meaning:
• How big is that plane?
• Would I be flying on a large or small plane?
But note the following sentence in which we cannot substitute large for big:
• Miss Nelson, for instance, became a kind of big sister to Benjamin.
• Miss Nelson, for instance, became a kind of large sister to Benjamin.
This is because the word big has a sense that means being older or grown up, while large
lacks this sense. Thus, we say that some senses of big and large are (nearly) synonymous
while other ones are not.
BVRIT HYDERABAD College of Engineering for Women
Word Sense
2. Antonymy
• Synonyms are words with identical or similar meanings, Antonyms are words
with an opposite meaning, like:
long/short big/little fast/slow cold/hot dark/light rise/fall up/down
in/out
• Two senses can be antonyms if they are at opposite ends of some scale. This is the
case for long/short, fast/slow, or big/little, which are at opposite ends of the length or
size scale.
• Another group of antonyms, reversives, describe change or movement in opposite
directions, such as rise/fall or up/down.
• Antonyms thus differ completely with respect to one aspect of their meaning, their
position on a scale or their direction—but are otherwise very similar.
• Automatically distinguishing synonyms from antonyms can be difficult.
BVRIT HYDERABAD College of Engineering for Women
Word Sense
3. Taxonomic Relations
• Word senses can be related taxonomically.
• A word (or sense) is a hyponym of another word or sense if the first is more specific,
denoting a subclass of the other.
• For example, car is a hyponym of vehicle, dog is a hyponym of animal, and mango is
a hyponym of fruit.
• Conversely, we say that vehicle is a hypernym of car, and animal is a hypernym of
dog.
• It is unfortunate that the two words (hypernym and hyponym) are very similar and
hence easily confused; for this reason, the word superordinate is often used instead
of hypernym.
BVRIT HYDERABAD College of Engineering for Women
Word Sense
Another name for the hypernym/ hyponym structure is the IS-A hierarchy, in which we
say A IS-A B, or B subsumes A.
Hypernymy is useful for tasks like textual entailment or question answering; knowing
that leukemia is a type of cancer, for example, would certainly be useful in answering
questions about leukemia.
4. Meronymy
Another common relation is meronymy, the part-whole relation. A leg is part of a chair;
a wheel is part of a car. We say that wheel is a meronym of car, and car is a holonym of
wheel.
BVRIT HYDERABAD College of Engineering for Women
Word Sense
5. Structured Polysemy
The senses of a word can also be related semantically, in which case we call the
relationship between them structured polysemy. Consider this sense bank:
“The bank is on the corner of Nassau and Witherspoon.”
This sense, perhaps bank, means something like “the building belonging to a financial
institution”. These two kinds of senses (an organization and the building associated with
an organization ) occur together for many other words as well (school, university,
hospital, etc.). Thus, there is a systematic relationship between senses that we might
represent as BUILDING ORGANIZATION
This particular subtype of polysemy relation is called metonymy. Metonymy is the use of
one aspect of a concept or entity to refer to other aspects of the entity or to the entity
itself.
BVRIT HYDERABAD College of Engineering for Women
WordNet
The most commonly used resource for sense relations in English and many other
languages is the WordNet lexical database.
English WordNet consists of three separate databases, one each for nouns and verbs and a
third for adjectives and adverbs.
Each database contains a set of lemmas, each one annotated with a set of senses.
The WordNet 3.0 release has 117,798 nouns, 11,529 verbs, 22,479 adjectives, and 4,481
adverbs. The average noun has 1.23 senses, and the average verb has 2.16 senses.
WordNet can be accessed on the Web or downloaded locally.
BVRIT HYDERABAD College of Engineering for Women
WordNet
The lemma entry for the noun and adjective bass.
The set of near-synonyms for a WordNet sense is called a synset (for synonym set);
synsets are an important primitive in WordNet. The entry for bass includes synsets like
{bass1 , deep6}, or {bass6 , bass voice1 , basso2}.
BVRIT HYDERABAD College of Engineering for Women
WordNet
WordNet also labels each synset with a lexicographic category drawn from a semantic
field for example the 26 categories for nouns shown in Figure
BVRIT HYDERABAD College of Engineering for Women
WordNet
Some of the noun relations in WordNet.
Some of the noun relations in WordNet.
BVRIT HYDERABAD College of Engineering for Women
WordNet
WordNet demonstrating many relations
BVRIT HYDERABAD College of Engineering for Women
Word Sense Disambiguation
The task of selecting the correct sense for a word is called word sense disambiguation,
or WSD.
WSD algorithms take as input a word in context and a fixed inventory of potential word
senses and outputs the correct word sense in context.
All-words task: The system is given an entire texts and a lexicon with an inventory of
senses for each entry and we have to disambiguate every word in the text (or sometimes
just every content word).
The all-words task is similar to part-of-speech tagging, except with a much larger set of
tags since each lemma has its own set.
Supervised all-word disambiguation tasks are generally trained from a semantic
concordance, a corpus in which each open-class word in each sentence is labeled with its
word sense from a specific dictionary or thesaurus, most often WordNet.
The SemCor corpus is a subset of the Brown Corpus consisting of over 226,036 words
that were manually tagged with WordNet senses
BVRIT HYDERABAD College of Engineering for Women
Word Sense Disambiguation
The all-words WSD task, mapping from input words (x) to WordNet senses (y). Only
nouns, verbs, adjectives, and adverbs are mapped, and note that some words (like guitar
in the example) only have one sense in WordNet
BVRIT HYDERABAD College of Engineering for Women
Word Sense Disambiguation
Choose the most frequent sense for each word from the senses in a labeled corpus.
For WordNet, this corresponds to the first sense, since senses in WordNet are generally
ordered from most frequent to least frequent based on their counts in the SemCor sense-
tagged corpus.
The most frequent sense baseline can be quite accurate, and is therefore often used as a
default, to supply a word sense when a supervised algorithm has insufficient training data.
BVRIT HYDERABAD College of Engineering for Women
The WSD Algorithm: Contextual Embeddings
• The best performing WSD algorithm is a simple nearest-neighbor algorithm using
contextual word embeddings.
• At training time we pass each sentence in the SemCore labeled dataset through any
contextual embedding resulting in a contextual embedding for each labeled token in
SemCore.
• For each sense s of any word in the corpus, for each of the n tokens of that sense, we
average their n contextual representations vi to produce a contextual sense embedding
vs for s:
BVRIT HYDERABAD College of Engineering for Women
The WSD Algorithm: Contextual Embeddings
• At test time, given a token of a target word t in context, we compute its contextual
embedding t and choose its nearest neighbor sense from the training set, i.e., the sense
whose sense embedding has the highest cosine with t:
• In green are the contextual embeddings
precomputed for each sense of each word;
here we just show a few of the senses for
find.
• A contextual embedding is computed for
the target word found, and then the nearest
neighbor sense (in this case find9 v).
BVRIT HYDERABAD College of Engineering for Women
Feature Based WSD
Feature-based algorithms for WSD are extremely simple and function almost as well as
contextual language model algorithms.
The best performing IMS algorithm (Zhong and Ng, 2010), augmented by embeddings
(Iacobacci et al. 2016, Raganato et al. 2017b), uses an SVM classifier to choose the sense
for each input word with the following simple features of the surrounding words:
• part-of-speech tags (for a window of 3 words on each side, stopping at sentence
boundaries)
• collocation features of words or n-grams of lengths 1, 2, 3 at a particular location in a
window of 3 words on each side (i.e., exactly one word to the right, or the two words
starting 3 words to the left, and so on).
• weighted average of embeddings (of all words in a window of 10 words on each side,
weighted exponentially by distance)
BVRIT HYDERABAD College of Engineering for Women
Feature Based WSD
Consider the ambiguous word bass in the following WSJ sentence:
“ An electric guitar and bass player stand off to one side”
If we used a small 2-word window, a standard feature vector might include parts-of
speech, unigram and bigram collocation features, and a weighted sum g of embeddings,
that is:
would yield the following vector:
BVRIT HYDERABAD College of Engineering for Women
The Lesk Algorithm as WSD Baseline
• Generating sense labeled corpora like SemCor is quite difficult and expensive.
• An alternative class of WSD algorithms, knowledge-based algorithms, rely solely on
knowledge based WordNet or other such resources and don’t require labeled data.
• While supervised algorithms generally work better, knowledge-based methods can be
used in languages or domains where thesauruses or dictionaries but not sense labeled
corpora are available.
• The Lesk algorithm is the oldest and most powerful knowledge-based WSD method,
and is a useful baseline.
• Lesk is a family of algorithms that choose the sense whose dictionary gloss or
definition shares the most words with the target word’s neighborhood.
BVRIT HYDERABAD College of Engineering for Women
The Lesk Algorithm as WSD Baseline
Simplified Lesk Algorithm
The COMPUTEOVERLAP function returns the number of words in common between
two sets, ignoring function words or other words on a stop list.
The original Lesk algorithm defines the context in a more complex way.
BVRIT HYDERABAD College of Engineering for Women
The Lesk Algorithm as WSD Baseline
Consider disambiguating the word bank in the following context:
“The bank can guarantee deposits will eventually cover future tuition costs because it
invests in adjustable-rate mortgage securities”.
Given the following two WordNet senses:
Sense bank1 has two non-stopwords overlapping with the context in deposits and
mortgage, while sense bank2 has zero words, so sense bank1 is chosen.
BVRIT HYDERABAD College of Engineering for Women
Wikipedia as a source of training data
Datasets other than SemCor have been used for all-words WSD. One important direction
is to use Wikipedia as a source of sense-labeled data.
When a concept is mentioned in a Wikipedia article, the article text may contain an
explicit link to the concept’s Wikipedia page, which is named by a unique identifier. This
link can be used as a sense annotation.
For example, the ambiguous word bar is linked to a different Wikipedia article depending
on its meaning in context.
• In 1834, Sumner was admitted to the [[bar (law)|bar]] at the age of twenty-three, and
entered private practice in Boston.
• It is danced in 3/4 time (like most waltzes), with the couple turning approx. 180
degrees every [[bar (music)|bar]].
• Jenga is a popular beer in the [[bar (establishment)|bar]]s of Thailand.
BVRIT HYDERABAD College of Engineering for Women
Software
Some software programs made available by the research community for word sense
disambiguation are:
• IMS(It Makes Sense) http://nlp.comp.nus.edu.sg/software
This is complete word sense disambiguation system.
• WordNet-Similarity-2.05 http://search.cpan.org/dist/wordNet-Similarity
These WordNet similarity modules for Perl provide a quick way of computing
various word similarity measures.
• WikiRelate!
http://www.h-its.org/English/research/nlp/download/wiki pediasimilarity.php
This is a word similarity measure based on the categories in Wikipedia.
BVRIT HYDERABAD College of Engineering for Women
Thank You
BVRIT HYDERABAD College of Engineering for Women