Semantic Role Labeling
and
Conditional Random Field
Dr. Hiteshwar Kumar Azad
Email ID: hiteshwarkumar.azad@vit.ac.in
SCOPE
Semantic Role Labeling (SRL)
• Semantic Role Labeling (SRL) is a natural language processing task that
involves identifying and categorizing the different roles played by words or
phrases in a sentence with respect to a predicate (usually a verb).
• These roles typically include agents, patients, instruments, and other
semantic roles that help clarify the relationship between the verb and its
arguments.
Example: "John ate an apple."
Predicate: "ate" is the central action.
Roles:
• Agent (A0): "John" performs the action of eating.
• Patient (A1): "apple" is the entity affected by the action.
SRL (complex examples)
• In SRL, the goal is to automatically extract and label these semantic
roles to understand the underlying meaning of the sentence.
Complex Example: "The cat chased the mouse around the garden with
great speed."
• Predicate: "chased" - the central action.
• Roles:
• Agent (A0): "cat" is the one performing the action.
• Patient (A1): "mouse" is the entity being chased.
• Location (A2): "garden" specifies the area where the action occurs.
• Manner (A4): "with great speed" describes how the action is performed.
SRL use cases in NLP
Semantic Role Labeling (SRL) finds numerous applications across various
natural language processing (NLP) tasks due to its ability to uncover the
underlying semantic relationships between words in a sentence. Some key use
cases include:
• Information Extraction:
• aids in extracting structured information from unstructured text.
• By identifying roles like agents, patients, instruments, and locations, it helps in
populating structured databases or knowledge graphs from text.
• Question Answering:
• SRL assists in understanding the relationships between different parts of a question
and facilitates the extraction of relevant information to generate accurate answers.
• Sentiment Analysis:
• By finding the roles of entities and actions in a sentence, SRL contributes to sentiment
analysis by capturing the nuanced ways in which actions or entities influence
sentiments. language and aiding in the generation of more contextually appropriate
translations.
SRL Use cases in NLP
• Machine Translation:
• SRL improves the accuracy of machine translation systems by enabling better
understanding of sentence structures in the source.
• Information Retrieval:
• In information retrieval systems, SRL helps in understanding user queries and
matching them to relevant documents by identifying key semantic roles and
extracting the most relevant information.
• Text Summarization:
• SRL assists in summarizing text by identifying crucial semantic roles and
relationships, aiding in the extraction of the most important information for
summarization purposes.
SRL Use cases in NLP
• Chatbots and Virtual Assistants:
• SRL helps chatbots and virtual assistants better understand user input and
formulate more relevant and contextually appropriate responses by
deciphering the roles and relationships in the conversation.
• Text Understanding and Analysis:
• For various text analytics tasks, including sentiment analysis, opinion mining,
and text categorization, SRL provides a deeper understanding of text by
uncovering the semantic relationships within sentences.
Challenges of SRL
1.Label Dependencies: SRL involves assigning roles to words in a
sentence. Words' roles are interdependent, making it challenging to
predict a single word's role without considering the context and
relationships with other words.
2.Ambiguity and Context Sensitivity: Words often have multiple roles
based on context. For instance, in the sentence "She opened the door
with a key," "key" can be an instrument (used to open) or an object
(the thing being opened).
Examples of Label Dependencies in SRL
Verb-Argument Dependencies:
"The cat chased the mouse."
• Dependency 1: Agent (A0) and Verb (V):
• The label A0 (Agent) depends on the presence of a specific type of verb.
• Example: In the sentence "The cat chased the mouse," "cat" is the agent
of the action "chased."
• Dependency 2: Patient (A1) and Verb (V):
• The label A1 (Patient) also depends on the verb.
• Example: In the same sentence, "mouse" is the patient of the action
"chased."
Examples of Label Dependencies in SRL
Prepositional Phrase Dependencies:
"He opened the door with a key."
• Dependency 3: Instrument (A2) and Verb (V):
• The label A2 (Instrument) is dependent on a specific prepositional phrase.
• Example: "key" is the instrument used in opening the door.
• Dependency 4: Location (A1) and Verb (V):
• Sometimes, prepositional phrases denote locations.
• Example: In the sentence "She placed the book on the table," "table" is the location of
the action.
Examples of Label Dependencies in SRL
Coreference Dependencies:
"John gave Mary a book. She thanked him.“
• Dependency 5: Coreference between Sentences:
• The pronouns "She" and "him" in the second sentence refer back to entities in the first
sentence.
• Coreference resolution is crucial for assigning the correct roles (A0, A1) to the
appropriate entities.
Examples of Label Dependencies in SRL
Hierarchical Dependencies:
"The man who robbed the bank was arrested."
• Dependency 6: Nested Structures:
• Dependencies exist between different levels of syntactic hierarchy.
• Roles assigned to entities within nested clauses depend on the main verb.
• Example: "The man" is the agent of "was arrested" but is embedded within a relative
clause.
CRF (Conditional Random Field) algorithm
The Conditional Random Fields (CRF) algorithm is a type of probabilistic graphical
model used in machine learning, particularly in sequence labeling and structured
prediction tasks. It's an extension of the hidden Markov model (HMM) and is
employed when there are dependencies between output labels.
Key Components of CRF Algorithm:
1.Conditional Probability Modeling: CRFs model conditional probability
distributions. They estimate the probability of a sequence of labels given the
observed input sequence.
2. Feature Representation: CRFs use feature functions that capture information
from input observations and their context. These features help in making
predictions by associating weights with them.
3. Label Dependencies: CRFs consider dependencies between labels in a
sequence. Unlike models such as HMMs that assume local label dependencies,
CRFs model global dependencies, allowing better capture of context.
CRF for NER tagging examples
• Example: Consider the task of named entity recognition (NER) where
the goal is to identify entities (e.g., person names, locations) in a
sentence.
• Input: "Apple is located in California."
• Output: (Apple, ORGANIZATION), (California, LOCATION)
• CRFs for NER may use features such as:
• Word embeddings representing semantic information.
• Capitalization features to identify proper nouns.
• Word context features indicating nearby words and their labels.
• The CRF model learns weights for these features during training and predicts
the most probable sequence of labels for new sentences.
CRF Algorithm overview
1.Input:
• A set of input sequences (e.g., sentences).
• Corresponding output sequences (e.g., part-of-speech tags, named entity labels,
etc.).
2.Feature Extraction:
• Convert input observations (words, phrases, etc.) into feature representations.
• These features capture various aspects of the input data and its context.
3.Parameter Learning:
• CRFs learn parameters (weights) for the features using training data.
• The learning process involves optimizing the model to maximize the likelihood of the
observed output labels given the input data.
4.Inference:
• Given a new input sequence, CRFs perform inference to predict the most likely
sequence of output labels.
• This inference process involves finding the label sequence that maximizes the
conditional probability given the input sequence and learned parameters.
Benefits of CRF algorithms
• Handles label dependencies effectively.
• Accommodates various types of features for richer representations.
• Allows incorporation of global context for better predictions.
Problems in CRF
1.Label Dependencies: SRL involves assigning roles to words in a
sentence. Words' roles are interdependent, making it challenging to
predict a single word's role without considering the context and
relationships with other words.
2.Ambiguity and Context Sensitivity: Words often have multiple roles
based on context. For instance, in the sentence "She opened the door
with a key," "key" can be an instrument (used to open) or an object
(the thing being opened).
Solution of SRL using CRF
CRFs are effective in addressing these challenges by modeling label dependencies and
contextual information.
1. Sequential Modeling:
• CRFs consider the sequential nature of sentences and the interdependence of labels. For
example:
• CRFs understand that the verb "eats" expects an agent (A0) and a patient (A1), creating a
structured prediction by considering this sequence.
Solution of SRL using CRF
2. Feature Representation: CRFs utilize rich feature representations that
encode contextual information. These features could include:
• Word embeddings capturing semantic similarity and context.
• Part-of-Speech (POS) tags indicating the syntactic structure.
• Dependency relations between words.
• Predicate-argument structures.
3. Incorporating Global Context:CRFs account for the global context of
the entire sentence while making local predictions. This global view helps
in resolving ambiguities. For instance: "He saw the man with a telescope."
• The CRF considers the entire sentence to disambiguate whether "with a
telescope" modifies "saw" or "man."
Examples of CRF in SRL
Consider a sentence The dog chases the cat.
CRF features may include:
• Word embeddings: Representing semantic similarity.
• POS tags: Differentiating between nouns (dog, cat) and verbs (chases).
• Transition features: Capturing the likelihood of a verb being followed by a
subject (A0) and an object (A1).
• CRF predictions might assign labels:
Dependency grammar in NLP
• Dependency grammar is a fundamental concept in NLP that allows us to
understand how words connect within sentences.
• It provides a framework for representing sentence structure based on word-
to-word relationships.
Example: The dependency grammar framework to represent the following
sentence “Kevin can hit the baseball with a bat”:
Dependency grammar
Breakdown of dependency grammar:
• Word tokens: A sentence is made up of a group of word tokens. Each of these
tokens has a unique function and is a building block of the language. For the
sentence: “Kevin can hit the baseball with a bat,” the tokens are: “Kevin,” “can,”
“hit,” “the,” “baseball,” “with,” “a,” “bat.”
• Dependency relations: Dependency grammar focuses on how words relate to
each other by using arrows or lines. For example, in the dependency grammar
example above, the word “with” depends on the word “hit” as a preposition.
• The governor and the dependent: In every word relationship, there are two key
roles: the governor and the dependent. For instance, in the sentence “Kevin can
hit the baseball with a bat,” the word “hit” acts as the governor because it's the
main action, while “Kevin” serves as the dependent since the action relies on the
subject.
• Dependency labels: Each dependency relation line is labeled to illustrate the
relationship between the words on each end. Labels like subject (subj) and object
(obj) provide the grammatical role for every word in the sentence structure.
Dependency grammar
Applications:
• Dependency parsing: Using dependency grammar principles, this process automatically
analyzes sentences and produces a tree that illustrates the grammatical relationships
between words. This is essential for understanding the structure of sentences.
• Information extraction: Structured information can be extracted from text using
dependency relations, which allows for identifying relationships between entities and
facts in a document.
• Machine translation: When translating between languages, dependency structures help
align words and phrases. They help to ensure accurate and clear translations.
• Text-to-speech synthesis: Dependency information influences the rhythm and tone of
synthesized speech, which enhances its natural sound.
Dependency Parsing
• Dependency Parsing is the process to analyze the grammatical structure in a
sentence and find out related words as well as the type of the relationship
between them.
Each relationship:
• Has one head and a dependent that modifies the head.
• Is labeled according to the nature of the dependency between the head and
the dependent. These labels can be found at Universal Dependency Relations.
In the Fig.1, there exists a relationship between car and black
because black modifies the meaning of car. Here, car acts as the
head and black is a dependent of the head. The nature of the
relationship here is amod which stands for “Adjectival Modifier”. It
is an adjective or an adjective phrase that modifies a noun.
Fig.1: Dependency relation between two words
Dependency Parsing
Dependency Parsing Dependency Graph
Transition-based parsing
• Transition-based dependency parsing is a popular approach to syntactic analysis that
constructs a dependency parse tree incrementally, one transition at a time.
• It uses a stack and a buffer to store the words of the sentence, and a set of transition rules to
modify the stack and buffer.
• It is a fast and effective approach for dependency parsing.
Basic Components
1.Stack: Stores the words that have been
processed and are currently being considered for
parsing.
2.Buffer: Stores the remaining words of the
sentence that have not yet been processed.
3.Transition Rules: A set of predefined rules that
determine how the stack and buffer are modified.
Fig. Transition-based parser
Transition-based parsing
Common Transition Rules
•SHIFT: Moves a word from the buffer to the stack.
•LEFT-ARC: Creates a dependency from the top word on the stack to the second-to-top word.
•RIGHT-ARC: Creates a dependency from the second-to-top word on the stack to the top word.
Parsing Process
1.Initialization: The stack is empty, and the buffer contains all the words of the sentence.
2.Transition Application: A transition rule is selected based on the current state of the stack and
buffer. The rule is applied to modify the stack and buffer.
3.Termination: The parsing process terminates when the buffer is empty and the stack contains a
single word, which is the root of the dependency tree.
Transition-based parsing
• The possible transitions correspond to the intuitive actions one might take in creating a dependency
tree by examining the words in a single pass over the input from left to right.
✓ Assign the current word as the head of some previously seen word,
✓ Assign some previously seen word as the head of the current word,
✓ Postpone dealing with the current word, storing it for later processing.
Transition-based parsing : Example
Example:
Transition-based parsing : Example
Transition-based parsing : Example(Cont.)
Transition-based parsing : Example
Fig. Trace of a transition-based parse.
Thanks