Skip to main content

    Keith Hall

    We present a novel approach to query reformulation which combines syntactic and semantic information by means of generalized Levenshtein distance algorithms where the substitution operation costs are based on probabilistic term rewrite... more
    We present a novel approach to query reformulation which combines syntactic and semantic information by means of generalized Levenshtein distance algorithms where the substitution operation costs are based on probabilistic term rewrite functions. We investigate unsupervised, compact and efficient models, and provide empirical evidence of their effectiveness. We further explore a generative model of query reformulation and supervised combination methods providing improved performance at variable computational costs.
    The accuracy and speed of state-of-the-art depen-dency parsers has motivated a resumed interest in utilizing the output of parsing as an input to many downstream natural language processing tasks. This includes work on question answering... more
    The accuracy and speed of state-of-the-art depen-dency parsers has motivated a resumed interest in utilizing the output of parsing as an input to many downstream natural language processing tasks. This includes work on question answering (Wang et al., 2007), sentiment analysis ( ...
    We present a large-scale, data-driven approach to computing distributional similarity scores for queries. We contrast this to recent web- based techniques which either require the off- line computation of complete phrase vectors, or an... more
    We present a large-scale, data-driven approach to computing distributional similarity scores for queries. We contrast this to recent web- based techniques which either require the off- line computation of complete phrase vectors, or an expensive on-line interaction with a search engine interface. Independent of the computational advantages of our approach, we show empirically that our technique is more effective at
    ABSTRACT We consider the task of assigning categories (e.g., howto/cooking, sports/basketball, pet/dogs) to YouTube videos from video and text signals. We show that two complementary views on the data -- from the video and text... more
    ABSTRACT We consider the task of assigning categories (e.g., howto/cooking, sports/basketball, pet/dogs) to YouTube videos from video and text signals. We show that two complementary views on the data -- from the video and text perspectives -- complement each other and refine predictions. The contributions of the paper are threefold: (1) we show that a text-based classifier trained on imperfect predictions of the weakly supervised video content-based classifier is not redundant; (2) we demonstrate that a simple model which combines the predictions made by the two classifiers outperforms each of them taken independently; (3) we analyse such sources of text information as video title, description, user tags and viewers' comments and show that each of them provides valuable clues to the topic of the video.
    While speaking spontaneously, speakers often make errors such as self-correction or false starts which interfere with the successful application of natural language processing techniques like summarization and machine translation to this... more
    While speaking spontaneously, speakers often make errors such as self-correction or false starts which interfere with the successful application of natural language processing techniques like summarization and machine translation to this data. There is active work on reconstructing this error- ful data into a clean and fluent transcript by identifying and removing these simple errors. Previous research has approximated the
    This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept. CE-Q generalizes both Nash- Q and Friend-and-Foe-Q: in general-sum games, the set of... more
    This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept. CE-Q generalizes both Nash- Q and Friend-and-Foe-Q: in general-sum games, the set of correlated equilibria contains the set of Nash equilibria; in constantsum games, the set of correlated equilibria contains the set of minimax equilibria. This paper describes experiments with four variants of CE-Q, demonstrating empirical convergence to equilibrium policies on a testbed of general-sum Markov games.
    Research Interests:
    This paper introduces a Maximum Entropy dependency parser based on an efficient k- best Maximum Spanning Tree (MST) algo- rithm. Although recent work suggests that the edge-factored constraints of the MST al- gorithm significantly inhibit... more
    This paper introduces a Maximum Entropy dependency parser based on an efficient k- best Maximum Spanning Tree (MST) algo- rithm. Although recent work suggests that the edge-factored constraints of the MST al- gorithm significantly inhibit parsing accu- racy, we show that generating the 50-best parses according to an edge-factored model has an oracle performance well above the 1-best performance of the best dependency parsers. This motivates our parsing ap- proach, which is based on reranking the k- best parses generated by an edge-factored model. Oracle parse accuracy results are presented for the edge-factored model and 1-best results for the reranker on eight lan- guages (seven from CoNLL-X and English).
    Research Interests:
    Thesis advisor: Mark Johnson. Thesis (Ph.D.)--Brown University, 2005. Includes bibliographical references (leaves 121-126).
    Research Interests:
    ABSTRACT We introduce a large-scale semantic-network annotation effort based on the MutliNet formalism. Annotation is achieved via a process which incorporates several independent tools including a MultiNet graph editing tool, a semantic... more
    ABSTRACT We introduce a large-scale semantic-network annotation effort based on the MutliNet formalism. Annotation is achieved via a process which incorporates several independent tools including a MultiNet graph editing tool, a semantic concept lexicon, a user-editable knowledge-base for semantic concepts, and a MultiNet parser. We present an evaluation metric for these semantic networks, allowing us to determine the quality of annotations in terms of inter-annotator agreement. We use this metric to report the agreement rates for a pilot annotation effort involving three annotators.
    ABSTRACT In this paper we investigate temporal patterns of web search queries. We carry out several evaluations to analyze the properties of temporal profiles of queries, revealing promising semantic and pragmatic relationships between... more
    ABSTRACT In this paper we investigate temporal patterns of web search queries. We carry out several evaluations to analyze the properties of temporal profiles of queries, revealing promising semantic and pragmatic relationships between words. We focus on two applications: query suggestion and query categorization. The former shows a potential for time-series similarity measures to identify specific semantic relatedness between words, which results in state-of-the-art performance in query suggestion while providing complementary information to more traditional distributional similarity measures. The query categorization evaluation suggests that the temporal profile alone is not a strong indicator of broad topical categories.
    Per-instance attributes are acquired using a weakly supervised extraction method which exploits anonymized Web-search query sessions, as an alternative to isolated, individual queries. Examples of these attributes are top speed for... more
    Per-instance attributes are acquired using a weakly supervised extraction method which exploits anonymized Web-search query sessions, as an alternative to isolated, individual queries. Examples of these attributes are top speed for chevrolet corvette, or population density for brazil). Inherent challenges associated with using sessions for attribute extraction, such as a large majority of within-session queries not being related to attributes,
    Introduction This paper introduces eorrelated-Q learning, a multi-agent learning algorithm that learns equilibrium poli-cies in Markov games, just as Q-learning learns to opti-mal policies in Markov decision processes. Correlated-Q... more
    Introduction This paper introduces eorrelated-Q learning, a multi-agent learning algorithm that learns equilibrium poli-cies in Markov games, just as Q-learning learns to opti-mal policies in Markov decision processes. Correlated-Q learning is named for correlated equilibria ( ...
    Page 1. Using Search-Logs to Improve Query Tagging Kuzman Ganchev Keith Hall Ryan McDonald Slav Petrov Google, Inc. {kuzman|kbhall|ryanmcd|slav}@google. com Abstract ... arg max t 0.2π(t|w)+0.8 mean s:w∈s ψ(t|w, s) (1) ...
    ABSTRACT We introduce a large-scale semantic-network annotation effort based on the MutliNet formalism. Annotation is achieved via a process which incorporates several independent tools including a MultiNet graph editing tool, a semantic... more
    ABSTRACT We introduce a large-scale semantic-network annotation effort based on the MutliNet formalism. Annotation is achieved via a process which incorporates several independent tools including a MultiNet graph editing tool, a semantic concept lexicon, a user-editable knowledge-base for semantic concepts, and a MultiNet parser. We present an evaluation metric for these semantic networks, allowing us to determine the quality of annotations in terms of inter-annotator agreement. We use this metric to report the agreement rates for a pilot annotation effort involving three annotators.
    The accuracy and speed of state-of-the-art depen-dency parsers has motivated a resumed interest in utilizing the output of parsing as an input to many downstream natural language processing tasks. This includes work on question answering... more
    The accuracy and speed of state-of-the-art depen-dency parsers has motivated a resumed interest in utilizing the output of parsing as an input to many downstream natural language processing tasks. This includes work on question answering (Wang et al., 2007), sentiment analysis ( ...
    @Book{NAACLHLT09:2009, editor = {Mari Ostendorf and Michael Collins and Shri Narayanan and Douglas W. Oard and Lucy Vanderwende}, title = {Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American... more
    @Book{NAACLHLT09:2009, editor = {Mari Ostendorf and Michael Collins and Shri Narayanan and Douglas W. Oard and Lucy Vanderwende}, title = {Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, month = {June}, year = {2009}, address = {Boulder, Colorado}, publisher = {Association for Computational Linguistics}, url = {http://www.aclweb.org/anthology/ N/N09/N09-1} } @InProceedings{su-markert:2009:NAACLHLT09, author = {Su ...
    This paper presents a corrective model for speech recognition of inflected lan- guages. The model, based on a discrim- inative framework, incorporates word n- grams features as well as factored mor- phological features, providing error... more
    This paper presents a corrective model for speech recognition of inflected lan- guages. The model, based on a discrim- inative framework, incorporates word n- grams features as well as factored mor- phological features, providing error reduc- tion over the model based solely on word n-gram features. Experiments on a large vocabulary task, namely the Czech portion of the MALACH corpus,
    The goal of this work is to integrate query similarity metrics as features into a dense model that can be trained on large amounts of query log data, in order to rank query rewrites. We propose features that incorpo- rate various notions... more
    The goal of this work is to integrate query similarity metrics as features into a dense model that can be trained on large amounts of query log data, in order to rank query rewrites. We propose features that incorpo- rate various notions of syntactic and semantic similarity in a generalized edit distance frame- work. We use the implicit feedback of
    This paper presents and compares WordNet- based and distributional similarity approaches. The strengths and weaknesses of each ap- proach regarding similarity and relatedness tasks are discussed, and a combination is pre- sented. Each of... more
    This paper presents and compares WordNet- based and distributional similarity approaches. The strengths and weaknesses of each ap- proach regarding similarity and relatedness tasks are discussed, and a combination is pre- sented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a super- vised combination of them yields the best
    Abstract We present our system used in the CoNLL 2007 shared task on multilingual parsing. The system is composed of three components: a k-best maximum spanning tree (MST) parser, a tree labeler, and a reranker that orders the k-best... more
    Abstract We present our system used in the CoNLL 2007 shared task on multilingual parsing. The system is composed of three components: a k-best maximum spanning tree (MST) parser, a tree labeler, and a reranker that orders the k-best labeled trees. We present two techniques for training the MST parser: tree-normalized and graphnormalized conditional training. The treebased reranking model allows us to explicitly model global syntactic phenomena. We describe the reranker features which include non-projective ...

    And 4 more