Michael Wick
PhD Candidate, Computer Science.
My research addresses the notoriously difficult learning and inference problems that arise from high-level specifications of factor graphs, especially procedurally-encoded conditional random fields. My approach is largely driven by real-world applications (coreference, information extraction, ontology mapping) that require scalability to both large amounts of data, and to statistical models with many dependencies among the hidden variables (factors with high arity in graphs with large cliques).
There will soon be a publicly available language/toolkit for specifying factor graphs (tentatively called FactorIE) taking full advantage of our research in learning and inference. In the mean time, please see: http://ciir-publications.cs.umass.edu/pdf/IR-697.pdf for more information.
Supervisors: Andrew McCallum
Address: Information Extraction and Synthesis Lab
Center for Intelligent Information Retrieval
Department of Computer Science
University of Massachusetts
140 Governor's Drive
Amherst, MA 01003
My research addresses the notoriously difficult learning and inference problems that arise from high-level specifications of factor graphs, especially procedurally-encoded conditional random fields. My approach is largely driven by real-world applications (coreference, information extraction, ontology mapping) that require scalability to both large amounts of data, and to statistical models with many dependencies among the hidden variables (factors with high arity in graphs with large cliques).
There will soon be a publicly available language/toolkit for specifying factor graphs (tentatively called FactorIE) taking full advantage of our research in learning and inference. In the mean time, please see: http://ciir-publications.cs.umass.edu/pdf/IR-697.pdf for more information.
Supervisors: Andrew McCallum
Address: Information Extraction and Synthesis Lab
Center for Intelligent Information Retrieval
Department of Computer Science
University of Massachusetts
140 Governor's Drive
Amherst, MA 01003
less
Uploads
Papers by Michael Wick
ence have been shown to provide state-of-the-art experimental results on tasks
such as identity uncertainty and information integration. However, learning pa-
rameters in these models is difficult because computing the gradients require ex-
pensive inference routines. In this paper we propose an online algorithm that
instead learns preferences over hypotheses from the gradients between the atomic
steps of inference. Although there are a combinatorial number of ranking con-
straints over the entire hypothesis space, a connection to the frameworks of sam-
pled convex programs reveals a polynomial bound on the number of rankings
that need to be satisfied in practice. We further apply ideas of passive aggressive
algorithms to our update rules, enabling us to extend recent work in confidence-
weighted classification to structured prediction problems. We compare our algo-
rithm to structured perceptron, contrastive divergence, and persistent contrastive
divergence, demonstrating substantial error reductions on two real-world prob-
lems (20% over contrastive divergence).
first-order logic or other languages give rise to notoriously
difficult inference problems. Because unrolling the structure
necessary to represent distributions over all hypotheses has
exponential blow-up, solutions are often derived from MCMC. However,
because of limitations in the design and parameterization of the
jump function, these sampling-based methods suffer from local
minima|the system must transition through lower-scoring
configurations before arriving at a better MAP solution. This paper
presents a new method of explicitly selecting fruitful downward
jumps by leveraging reinforcement learning (RL). Rather than setting
parameters to maximize the likelihood of the training data,
parameters of the factor graph are treated as a log-linear function
approximator and learned with temporal difference (TD); MAP
inference is performed by executing the resulting policy on held out
test data. Our method allows efficient gradient updates since only
factors in the neighborhood of variables affected by an action need
to be computed|we bypass the need to compute marginals entirely.
Our method provides dramatic empirical success, producing new
state-of-the-art results on a complex joint model of ontology
alignment, with a 48\% reduction in error over state-of-the-art in
that domain.
large factor graphs using reinforcement learning. We formulate MAP
inference as an optimization problem in the output configuration
space and use reinforcement learning (RL) to learn an optimal policy
that identifies a sequence of transitions in the configuration
manifold that transforms an arbitrary point in the feasible region
into the MAP configuration. In our RL treatment of this problem the
delayed reward is a measure of the residual performance improvement
between configurations as the system transitions through the
configuration space. We propose two approaches. The first approach
uses the ground truth signal during training to learn a linear
function approximator that generalizes to a novel testing set. In
this scenario, MAP inference is just a matter of performing policy
search on the learned value function. In the second approach, we use
the ground truth signal to learn a reward function directly from the
training set; on held-out test data, the approximate reward function
is used to guide a traditional reinforcement learning algorithm to
the MAP configuration. In either case, the linear additive function
approximators provide the following advantages: (1) they allow
generalization from a training set to a novel testing set; (2) they
provide a representation compatible with log-linear models such as
conditional random fields (CRFs). We present preliminary results of
on real-world datasets.
heterogeneous sources into a single repository requires solving
several information integration tasks. Although tasks such as
coreference, schema matching, and canonicalization are closely related, they are most commonly studied in isolation. Systems that do tackle multiple integration problems traditionally solve each independently, allowing errors to propagate from one task to another. In this paper, we describe a discriminatively-trained model that reasons about schema matching, coreference, and canonicalization jointly. We evaluate our model on a real-world data set of people and demonstrate that simultaneously solving these tasks reduces errors over a cascaded or isolated approach. Our experiments show that a joint model is able to improve substantially over systems that either solve each task in isolation or with the conventional cascade. We demonstrate nearly a 50\% error reduction for coreference and a 40\% error reduction for schema matching.
ence have been shown to provide state-of-the-art experimental results on tasks
such as identity uncertainty and information integration. However, learning pa-
rameters in these models is difficult because computing the gradients require ex-
pensive inference routines. In this paper we propose an online algorithm that
instead learns preferences over hypotheses from the gradients between the atomic
steps of inference. Although there are a combinatorial number of ranking con-
straints over the entire hypothesis space, a connection to the frameworks of sam-
pled convex programs reveals a polynomial bound on the number of rankings
that need to be satisfied in practice. We further apply ideas of passive aggressive
algorithms to our update rules, enabling us to extend recent work in confidence-
weighted classification to structured prediction problems. We compare our algo-
rithm to structured perceptron, contrastive divergence, and persistent contrastive
divergence, demonstrating substantial error reductions on two real-world prob-
lems (20% over contrastive divergence).
first-order logic or other languages give rise to notoriously
difficult inference problems. Because unrolling the structure
necessary to represent distributions over all hypotheses has
exponential blow-up, solutions are often derived from MCMC. However,
because of limitations in the design and parameterization of the
jump function, these sampling-based methods suffer from local
minima|the system must transition through lower-scoring
configurations before arriving at a better MAP solution. This paper
presents a new method of explicitly selecting fruitful downward
jumps by leveraging reinforcement learning (RL). Rather than setting
parameters to maximize the likelihood of the training data,
parameters of the factor graph are treated as a log-linear function
approximator and learned with temporal difference (TD); MAP
inference is performed by executing the resulting policy on held out
test data. Our method allows efficient gradient updates since only
factors in the neighborhood of variables affected by an action need
to be computed|we bypass the need to compute marginals entirely.
Our method provides dramatic empirical success, producing new
state-of-the-art results on a complex joint model of ontology
alignment, with a 48\% reduction in error over state-of-the-art in
that domain.
large factor graphs using reinforcement learning. We formulate MAP
inference as an optimization problem in the output configuration
space and use reinforcement learning (RL) to learn an optimal policy
that identifies a sequence of transitions in the configuration
manifold that transforms an arbitrary point in the feasible region
into the MAP configuration. In our RL treatment of this problem the
delayed reward is a measure of the residual performance improvement
between configurations as the system transitions through the
configuration space. We propose two approaches. The first approach
uses the ground truth signal during training to learn a linear
function approximator that generalizes to a novel testing set. In
this scenario, MAP inference is just a matter of performing policy
search on the learned value function. In the second approach, we use
the ground truth signal to learn a reward function directly from the
training set; on held-out test data, the approximate reward function
is used to guide a traditional reinforcement learning algorithm to
the MAP configuration. In either case, the linear additive function
approximators provide the following advantages: (1) they allow
generalization from a training set to a novel testing set; (2) they
provide a representation compatible with log-linear models such as
conditional random fields (CRFs). We present preliminary results of
on real-world datasets.
heterogeneous sources into a single repository requires solving
several information integration tasks. Although tasks such as
coreference, schema matching, and canonicalization are closely related, they are most commonly studied in isolation. Systems that do tackle multiple integration problems traditionally solve each independently, allowing errors to propagate from one task to another. In this paper, we describe a discriminatively-trained model that reasons about schema matching, coreference, and canonicalization jointly. We evaluate our model on a real-world data set of people and demonstrate that simultaneously solving these tasks reduces errors over a cascaded or isolated approach. Our experiments show that a joint model is able to improve substantially over systems that either solve each task in isolation or with the conventional cascade. We demonstrate nearly a 50\% error reduction for coreference and a 40\% error reduction for schema matching.