Unit 3 and 4 Notes
Unit 3 and 4 Notes
SEMANTIC PARSING
A semantic system brings entities, concepts, relations and predicates together to provide
more context to language so machines can understand text data with more accuracy.
Semantic analysis derives meaning from language and lays the foundation for a
semantic system to help machines interpret meaning.
Machines lack a reference system to understand the meaning of words, sentences and
documents. Word sense disambiguation and meaning recognition can provide a better
understanding of language data for machines. Here is how each part of semantic
analysis work
To understand how NLP and semantic processing work together, consider this:
● Basic NLP can identify words from a selection of text.
● Semantics gives meaning to those words in context (e.g., knowing an apple as a
fruit rather than a company).
Semantics Makes Word Meaning Clear(WORD SENSE)
Semantic analysis, on the other hand, is crucial to achieving a high level of accuracy
when analyzing text.
Consider the task of text summarization which is used to create digestible chunks of
information from large quantities of text. Text summarization extracts words, phrases,
and sentences to form a text summary that can be more easily consumed. The
accuracy of the summary depends on a machine’s ability to understand language data.
2.Semantic Interpretation
Semantic parsing can be considered as part of Semantic interpretation, which
involves various components that together define a representation of text that can
be fed into a computer to allow further computations manipulations and search,
which are prerequisite for any language understanding system or application. Here
we start with discus with structure of semantic theory.
A Semantic theory should be able to:
1.Explain sentence having ambiguous meaning: The bill is large is ambiguous in
the sense that is could represent money or the beak of a bird.
2.Resolve the ambiguities of words in context. The bill is large but need not be
paid, the
theory should be able to disambiguate the monetary meaning of bill.
3.Identify meaningless sentence to syntactically well-formed sentence.
● Rule-based
● Dictionary lookups,
● POS Tagging,
● Dependency Parsing.
For Example,
Date: Thursday, Time: night, Location: Chateau Marmot, Person: Cate Blanchett
Now, we can start our discussion on Named Entity Recognition (NER),
1. Named Entity Recognition is one of the key entity detection methods in NLP.
● Organizations,
● Quantities,
● Monetary values,
● People’s names
● Company names
● Product names
● Names of events
3. In simple words, Named Entity Recognition is the process of detecting the named
entities such as person names, location names, company names, etc from the text.
For Example,
5. With the help of named entity recognition, we can extract key information to
understand the text, or merely use it to extract important information to store in a
database.
● Automated Chatbots,
● Content Analyzers,
Phrase Classification
In this classification step, we classified all the extracted noun phrases from the above
step into their respective categories. To disambiguate locations, Google Maps API can
provide a very good path. and to identify person names or company names, the open
databases from DBpedia, Wikipedia can be used. Apart from this, we can also make the
lookup tables and dictionaries by combining information with the help of different
sources.
Entity Disambiguation
Sometimes what happens is that entities are misclassified, hence creating a validation
layer on top of the results becomes useful. The use of knowledge graphs can be
exploited for this purpose. Some of the popular knowledge graphs are:
● Google Knowledge Graph,
● IBM Watson,
● Wikipedia, etc.
The blue cells represent the nouns. Some of these nouns describe real things present in
the world.
For Example, From the above, the following nouns represent physical places on a map.
Therefore, the goal of NER is to detect and label these nouns with the real-world
concepts that they represent.
So, when we run each token present in the sentence through a NER tagging model, our
sentence looks like as,
NER systems aren’t just doing a simple dictionary lookup. Instead, they are using the
context of how a word appears in the sentence and used a statistical model to guess
which type of noun that particular word represents.
Since NER makes it easy to grab structured data out of the text, therefore it has tons of
uses. It’s one of the easiest methods to quickly get insightful value out of an NLP
pipeline.
If you want to try out NER yourself, then refer to the link.
How does Named Entity Recognition work?
As we can simple observed that after reading a particular text, naturally we can
recognize named entities such as people, values, locations, and so on.
Sentence: Sundar Pichai, the CEO of Google Inc. is walking in the streets of California.
From the above sentence, we can identify three types of entities: (Named Entities)
● (“location”: “California”).
But to do the same thing with the help of computers, we need to help them recognize
entities first so that they can categorize them. So, to do so we can take the help of
machine learning and Natural Language Processing (NLP).
Let’s discuss the role of both these things while implementing NER using computers:
● NLP: It studies the structure and rules of language and forms intelligent
systems that are capable of deriving meaning from text and speech.
To learn what an entity is, a NER model needs to be able to detect a word or string of
words that form an entity (e.g. California) and decide which entity category it belongs to.
So, as a concluding step we can say that the heart of any NER model is a two-step
process:
So first, we need to create entity categories, like Name, Location, Event, Organization,
etc., and feed a NER model relevant training data.
Then, by tagging some samples of words and phrases with their corresponding entities,
we’ll eventually teach our NER model to detect the entities and categorize them.
The phrase "for every x'' (sometimes "for all x'') is called a universal quantifier and is
denoted by x. The phrase "there exists an x such that'' is called an existential quantifier
and is denoted by ! x.
3.System Paradigms
It is important to get a perspective on the various primary dimensions on which the
problem of semantic interpretation has been tackled.
The approaches generally fall into the following three categories: 1.System architecture
2.Scope 3. Coverage.
System Architectures
a.Knowledge based: These systems use a predefined set of rules or a knowledge base
to obtain
a solution to a new problem.
b.Supervised :
AI Chatbots and AI Virtual Assistants using Supervised Learning are trained using data
that is well-labeled (or tagged). During training, those systems learn the best mapping
function between known data input and the expected known output. Supervised
NLP models then use the best approximating mapping learned during training to
analyze unforeseen input data (never seen before) to accurately predict the
corresponding output.
Usually, Supervised Learning models require extensive and iterative optimization cycles
to adjust the input-output mapping until they converge to an expected and well-accepted
level of performance. This type of learning keeps the word “supervised” because its way
of learning from training data mimics the same process of a teacher supervising the
end-to-end learning process. Supervised Learning models are typically capable of
achieving excellent levels of performance but only when enough labeled data is
available.
For example, a typical task delivered by a supervised learning model for AI chatbot /
Virtual Assistants is the classification (via a variety of different algorithms like (Support
Vector Machine, Random Forest, Classification Trees, etc.) of an input user utterance
into a known class of user intents.
The precision achieved by those techniques is really remarkable though the shortfall is
limited coverage of intent classes to only those for which labeled data is available for
training.
c.Unsupervised Learning
To overcome the limitations of Supervised Learning, academia, and industry started
pivoting towards the more advanced (but more computationally complex) Unsupervised
Learning which promises effective learning using unlabeled data (no labeled data is
required for training) and no human supervision (no data scientist or high-technical
expertise is required). This is an important advantage compared to Supervised Learning,
as unlabeled text in digital form is in abundance, but labeled datasets are usually
expensive to construct or acquire, especially for common NLP tasks like PoS tagging or
Syntactic Parsing.
Unsupervised Learning models are equipped with all the needed intelligence and
automation to work on their own and automatically discover information, structure, and
patterns from the data itself. This allows for the Unsupervised NLP to shine.
Advancing AI with Unsupervised Learning
Unsupervised Learning is also used for association rules mining which aims at
discovering relationships between features directly from data. This technique is typically
used to automatically extract existing dependencies between named entities from input
user utterances, dependencies of intents across a set of user utterances part of the
same user/system session, or dependencies of questions and answers from
conversational logs capturing the interactions between users and live agents during the
problem troubleshooting process.
2.Scope:
a.Domain Dependent: These systems are specific to certain domains, such as air travel
reservations or simulated football coaching.
b.Domain Independent: These systems are general enough that the techniques can be
applicable
to multiple domains without little or no change.
3.Coverage:
a.Shallow: These systems tend to produce an intermediate representation that can then
be converted to one that a machine can base its action on.
b. Deep: These systems usually create a terminal representation that is directly
consumed by a
machine or application.
Much of this information is historical and cannot readily be translated and made
available for
building systems today. But some of techniques and algorithms are still available.
The simplest and oldest dictionary based sense disambiguation algorithm was
introduced by leak author .
The core of the algorithm is that the dictionary sense whose terms most closely overlap
with the terms in the context.
The word is being analysed and best sense of the word is compared to the
database(signature ) of the word is being matched with the exact gloss(scenario) of the
word. And the word is being sensed and retrieved accordingly.
Example of gloss referring to correct context
The word is mapped to the signature file structure present in the database..given below
is signature file structure.
wORD Signature
Application(s)/Advantage(s)
•This study used Roget’s Thesaurus categories and classified unseen words into one of these
1042 categories based on a statistical analysis of 100 word concordances for each
Finally, in third step, the unseen words in the test set are classified into the classified
into the category that has the maximum weight and according to the Rank the
information is retrieved.
The disambiguating which word to retrieved is categorized according to the weight and
ranking which is being calculated with rogets formula and concept /thesaurus is
generated for the specific word
2.Supervised Learning
In supervised learning, models are trained using labelled dataset, where the model learns about each type of data. Once the training process is
completed, the model is tested on the basis of test data (a subset of the training set), and then it predicts the output.
The working of Supervised learning can be easily understood by the below example and diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and Polygon. Now the first step is that we
need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it classifies the shape on the bases of a number of sides,
and predicts the output.
This can be done with the help of Natural Language Processing and different Classification Algorithms like
Naive Bayes, Support Vector Machine and even Neural Networks in Python.These are the common attributes
in supervised learning for nlp woed sensing
a)Features: Here we discuss a more commonly found subset of features that have been
useful in supervised learning of word sense.
b)Lexical context: The feature comprises the words and lemma of words occurring
in the entire paragraph or a smaller window of usually five words.
c)Parts of speech : the feature comprises the POS information for words in the
surrounding the word that is being sense tagged according to the suitable parts of
speech .
d)Bag of words context: this feature comprises using an unordered set of words in the
context
Window which is being properly classified.
e)Local Collections : Local collections are an ordered sequence of phrases near the
target word that provide semantic context for disambiguation. Usually, a very small
window of about three tokens on each side of the target word, most often in contiguous
pairs or triplets, are added as a list of features.
f)Syntactic relations: if the parse of the sentence containing the target word is available,
then we can use syntactic features.
g)Topic features: The board topic, or domain, of the article that word belongs to is also
a good indicator of what sense of the word might be most frequent to that specific
domain.
The chen and palmer author noticed the word sensed creates diambigutiy and
confusion in sensing the word when it is unable identify the
2.. Presence of subject/ object: unble to identify the subject when given large amount of
training data
3.Sentential complement: Sentential complementation is a kind of sentence in which
one of the arguments of a verb is a clause. That clausal argument is called
a complement clause.
examples
The term complement clause is extended by some analysts to include clauses selected
by nouns or adjectives.
Examples:
● I heard the evidence that he did it.
● I am sure that he did it.
● I am not certain what we did.
4.Prepostional Pharse Adjunct:
An adjunct is any adverb, adverbial clause, adverbial phrase or prepositional phrase
that gives more information primarily about the action in the sentence.
Above mention are the rules for selecting the domain of the word and concept
thesuaras is generated accordingly with pos tags and proper word is being retrieved.
3.Unsupervised learning:
Unsupervised Learning which promises effective learning using unlabeled data (no
labeled data is required for training) and no human supervision (no data scientist or
high-technical expertise is required). This is an important advantage compared to
Supervised Learning, as unlabeled text in digital form is in abundance, but labeled
datasets are usually expensive to construct or acquire, especially for common NLP
tasks like PoS tagging or Syntactic Parsing.
Unsupervised Learning models are equipped with all the needed intelligence and
automation to work on their own and automatically discover information, structure, and
Advancing AI AND NLP with Unsupervised Learning
The most popular applications of Unsupervised Learning in advanced AI chatbots / AI
Virtual Assistants are clustering (like K-mean, Mean-Shift, Density-based, Spectral
clustering, etc patterns from the data itself. This allows for the Unsupervised NLP to
shine and association rules methods.
Unsupervised Learning is also used for association rules mining which aims at
discovering relationships between features directly from data. This technique is typically
used to automatically extract existing dependencies between named entities from input
user utterances, dependencies of intents across a set of user utterances part of the
same user/system session, or dependencies of questions and answers from
conversational logs capturing the interactions between users and live agents during the
problem troubleshooting process.
Even though the benefits and level of automation brought by Unsupervised Learning are
large and technically very intriguing, Unsupervised Learning, in general, is less accurate
and trustworthy compared to Supervised Learning. Indeed, the most advanced AI
Chatbot / AI Virtual Assistant technologies in the market strive by achieving the right
level of balance between the two technologies, which when exploited correctly can
deliver the accuracy and precision of Supervised Learning (tasks for which labeled data
is available) coupled with the self-automation of unsupervised learning (tasks for which
no labeled data is available).
In Unsupervised learning for the specific topic domain the depth of tree is
calculated by concept density (CD) or conceptual density
The formula is given below
The depth of the tree is being examined and best word sense is being retrieved and
the disambiguity is cleared by examining the depth of the word by refereeing to sources
like word net(antonyms synonyms and metonyms).
Conceptual density is calculated by above formula . The tree which is having highest
density is being referred and best sense of the word is being retrieved
Software:
Several software programs are made available by the research community for word
sense disambuguity.
Few are listed below
Above table listed are the source and software’s to analysis the word sensing to the best
Predicate Argument Structure:
A thing that refers to the type of event or state we are dealing with is termed a predicate,
while the things that refer to the participants in the event/state are called the arguments
of the predicate.
Predicates can be divided into two main categories: action and state of being.
Predicates that describe an action can be simple, compound, or complete. A simple
predicate is a verb or verb phrase without any modifiers or objects
Predicate argument structure is based on the function features of lexical items (most
often verbs). The function features determine the thematic roles to be played by the
other words in the sentence. However, function features and thematic roles don't always
coincide
Examples
● The sun (subject) / was shining brightly (predicate).
● The dogs (subject) / were barking loudly (predicate).
● The pretty girl (subject) / was wearing a blue frock (predicate).
● My younger brother (subject) / serves in the army (predicate).
● The man and his wife (subject) / were working in their garden (predicate).
Generally, this process can be defined as the identification of who did what to whom,
where, why and how. This is shown with help of diagram
These two grammar structures are used to identify semantic role Labeling (subject and
predicate):
Phrase structure grammar, also known as constituency grammar, is a way of representing the syntactic
structure of natural language sentences using hierarchical trees(refer to unit 2 )
In natural language processing (NLP), phrase structure grammar can be used to analyze, parse, and generate
natural language texts and semantic role labeling uses this structure
The above method is used to predict the best semantic word labeling and determines
subject and predicate of the sentence.
A Frame is a script-like conceptual structure that describes a particular type of situation, object, or event along with
the participants and props that are needed for that Frame. For example, the “Apply_heat” frame describes a
common situation involving a Cook, some Food, and a Heating_Instrument, and is evoked by words such as bake,
blanch, boil, broil, brown, simmer, steam, etc.
We call the roles of a Frame “frame elements” (FEs) and the frame-evoking words are called “lexical units” (LUs).
FrameNet includes relations between Frames. Several types of relations are defined, of which the most important
are:
● Inheritance: An IS-A relation. The child frame is a subtype of the parent frame, and each FE in the parent
is bound to a corresponding FE in the child. An example is the “Revenge” frame which inherits from the
“Rewards_and_punishments” frame.
● Using: The child frame presupposes the parent frame as background, e.g the “Speed” frame “uses” (or
presupposes) the “Motion” frame; however, not all parent FEs need to be bound to child FEs.
● Subframe: The child frame is a subevent of a complex event represented by the parent, e.g. the
“Criminal_process” frame has subframes of “Arrest”, “Arraignment”, “Trial”, and “Sentencing”.
● Perspective_on: The child frame provides a particular perspective on an un-perspectivized parent frame.
A pair of examples consists of the “Hiring” and “Get_a_job” frames, which perspectivize the
“Employment_start” frame from the Employer’s and the Employee’s point of view, respectively.
● Each LU is linked to a Frame, and hence to the other words which evoke that Frame. This makes the
FrameNet database similar to a thesaurus, grouping together semantically similar words.
Resources of framenet
The Berkeley FrameNet project is creating an on-line lexical resource for English,
based on frame semantics and supported by corpus evidence. The aim is to
document the range of semantic and syntactic combinatory possibilities
(valences) of each word in each of its senses, through computer-assisted
annotation of example sentences and automatic tabulation and display of the
annotation results. The major product of this work, the FrameNet lexical database,
currently contains more than 10,000 lexical units (defined below), more than 6,100
of which are fully annotated, in more than 825 semantic frames, exemplified in
more than 135,000 annotated sentences. It has gone through three releases, and
is now in use by hundreds of researchers, teachers, and students around the world
(see FrameNet Users). Active research projects are now seeking to produce
comparable frame-semantic lexicons for other languages and to devise means of
automatically labeling running text with semantic frame information.
2.Propbank:(Propostional bank)
PropBank was developed with the idea of serving as training data for machine learning-based semantic
role labeling systems in mind.
Representation of Propbank
Each propbank instance defines the following member variables:
● Inflection information: inflection
● Roleset identifier: roleset
● Verb location: predicate
● Argument locations and types: arguments
It is represented by argument structure for sentence in which which relates to specific
frame and connects to the verb
Example : frame :Commerce Propstionalbank orPropbank can connect to the following
verbs in different scenarios
Arg0:He can buy in the goods in ecommerce
Arg1:He bought things
Arg2:He can sell goods
Arg3:He can make payment of money
Arg 4He can be benefective from ecommerce source
Probable verb to which it can cannot and concept thesaurus of that frame is generated
connected to different argument structures
Many of the framset for a particular concept is identified and associated verb referring
to propbank is retrived according to the argument structure it will hit on to the proper
treebank and information is retrived.
Software used which support NlP which are rule based supports framnet and propbank
and are available for download
1.WASP: (welnberg AI autonomous system for software program)
https://wasp-sweden.org/ai-graduate-school-courses/
2.krisper: https://www.cs.utexas.edu/~ml/krisp/
3.chill:[http://www.cs.utexas.edu/ml/hill