Skip to main content
The textual entailment problem is to determine if a given text entails a given hypothesis. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether... more
The textual entailment problem is to determine if a given text entails a given hypothesis. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical con-cepts present in the hypothesis are entailed from the text. This problem is recast as one of text cate-gorization in which the
All components of a typical IE system have been the object of some machine learning research, motivated by the need to improve time taken to transfer to new domains. In this paper we survey such methods and assess to what extent they can... more
All components of a typical IE system have been the object of some machine learning research, motivated by the need to improve time taken to transfer to new domains. In this paper we survey such methods and assess to what extent they can help create a complete IE system that can be easily adapted to new domains. We also lay
This paper studies the potential of identifying lexical paraphrases within a single corpus, focusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of... more
This paper studies the potential of identifying lexical paraphrases within a single corpus, focusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of comparable corpora, each of them containing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel method that
This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of textual entailment. We further describe a particular preliminary model for lexical-level entailment, based on document cooccurrence... more
This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of textual entailment. We further describe a particular preliminary model for lexical-level entailment, based on document cooccurrence probabilities, which follows the general setting. The model was evaluated on two application independent datasets, suggesting the rele- vance of such probabilistic approaches for entailment modeling.
This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- cusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of... more
This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- cusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of "comparable" corpora, each of them contain- ing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel
This paper proposes a general probabilis-tic setting that formalizes the notion of textual entailment. In addition we de-scribe a concrete model for lexical en-tailment based on web co-occurrence statistics in a bag of words representation.
This paper investigates an isolated setting of the lexical substitution task of replac- ing words with their synonyms. In par- ticular, we examine this problem in the setting of subtitle generation and evaluate state of the art scoring... more
This paper investigates an isolated setting of the lexical substitution task of replac- ing words with their synonyms. In par- ticular, we examine this problem in the setting of subtitle generation and evaluate state of the art scoring methods that pre- dict the validity of a given substitution. The paper evaluates two context indepen- dent models and two contextual models.
A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in various ways. Robust language processing applica-tions-like Information Retrieval (IR), Question Answering (QA), Information Extraction (IE),... more
A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in various ways. Robust language processing applica-tions-like Information Retrieval (IR), Question Answering (QA), Information Extraction (IE), text summarization and machine translation-must recognize the different forms in which their inputs and requested outputs might be expressed. Today, inferences about language variability are often per-formed by practical systems at a
Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it is rarely evaluated in a direct manner. This paper proposes a definition for lexical reference which captures the common goals of lexical... more
Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it is rarely evaluated in a direct manner. This paper proposes a definition for lexical reference which captures the common goals of lexical matching. Based on this ...
In this paper we address the problem of aligning very long (of- ten more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recur- sive technique to solve this problem... more
In this paper we address the problem of aligning very long (of- ten more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recur- sive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem
The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phonetic units areidentified by the phonetic recognition engine and mappedto mouth gestures, known as visemes, the visual counterpartof phonemes.... more
The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phonetic units areidentified by the phonetic recognition engine and mappedto mouth gestures, known as visemes, the visual counterpartof phonemes. The acoustic waveform and visemesare then sent to a synthetic image player, called FaceMe!where they are rendered synchronously. This paper providesbackground for the core technologies
Abstract: We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical... more
Abstract: We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.