Kevin Duh

University of Washington, Electrical Engineering, Alumnus

Followers

Following

Public Views

Interests

Uploads

Papers by Kevin Duh

Extracting Pre-ordering Rules from Predicate-Argument Structures

NTT statistical machine translation system for IWSLT 2010

Factored language models tutorial

Abstract The Factored Language Model (FLM) is a flexible framework for incorporating various info... more Abstract The Factored Language Model (FLM) is a flexible framework for incorporating various information sources, such as morphology and part-of-speech, into language modeling. FLMs have so far been successfully applied to tasks such as speech recognition and machine translation; it has the potential to be used in a wide variety of problems in estimating probability tables from sparse data. This tutorial serves as a comprehensive description of FLMs and related algorithms.

Generalized Minimum Bayes Risk System Combination

An Improvement to the Predicate-Argument Structure Based Pre-ordering Approach for Statistical Machine Translation

Modifying a System in Response to Indications of User Frustration

An illustrative frustration processing system modifies the operation of a target system to improv... more An illustrative frustration processing system modifies the operation of a target system to improve its performance. In one case, the frustration processing system receives express indications that a user is frustrated in the course of interacting with the target system. The frustration processing system responds to these indications by modifying the operation of the target system to reduce the likelihood that the user will be frustrated in the future.

Syntactic Based Reordering Rules for Chinese-to-Japanese Machine Translation

Joint Labeling of Multiple Sequences

Conversational Topic Classification

Beyond the Search Box: Helping Users Find Health Information on the Web

Assisting cross-lingual editing in collaborative writing

Abstract Wikis have enabled Web users to author and edit documents in a collaborative manner. In ... more Abstract Wikis have enabled Web users to author and edit documents in a collaborative manner. In many cases such as Wikipedia and Wikibooks, they have been used to host a set of parallel or comparable documents written in different languages. While a wiki provides an environment in which editors can work together efficiently, maintaining a set of multi-lingual documents is still a very demanding task for the editors.

HPSG-Based Preprocessing for English-to-Japanese Translation

Abstract Japanese sentences have completely different word orders from corresponding English sent... more Abstract Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (distortion limit). For English-to-Japanese translation, we need a large distance limit to obtain acceptable translations, and the number of translation candidates is extremely large. Therefore, SMT systems often fail to find acceptable translations within a limited time.

Providing cross-lingual editing assistance to wikipedia editors

We propose a framework to assist Wikipedia editors to transfer information among different langua... more We propose a framework to assist Wikipedia editors to transfer information among different languages. Firstly, with the help of some machine translation tools, we analyse the texts in two different language editions of an article and identify information that is only available in one edition. Next, we propose an algorithm to look for the most probable position in the other edition where the new information can be inserted. We show that our method can accurately suggest positions for new information.

MSS: Investigating the effectiveness of domain combinations and topic features for word sense disambiguation

Head finalization: A simple reordering rule for sov languages

Abstract English is a typical SVO (Subject-Verb-Object) language, while Japanese is a typical SOV... more Abstract English is a typical SVO (Subject-Verb-Object) language, while Japanese is a typical SOV language. Conventional Statistical Machine Translation (SMT) systems work well within each of these language families. However, SMT-based translation from an SVO language to an SOV language does not work well because their word orders are completely different. Recently, a few groups have proposed rule-based preprocessing methods to mitigate this problem (Xu et al., 2009; Hong et al., 2009).

POS tagging of dialectal Arabic: A minimally supervised approach

Abstract Natural language processing technology for the dialects of Arabic is still in its infanc... more Abstract Natural language processing technology for the dialects of Arabic is still in its infancy, due to the problem of obtaining large amounts of text data for spoken Arabic. In this paper we describe the development of a part-of-speech (POS) tagger for Egyptian Colloquial Arabic. We adopt a minimally supervised approach that only requires raw text data from several varieties of Arabic and a morphological analyzer for Modern Standard Arabic. No dialect-specific tools are used.

Download

Jointly labeling multiple sequences: A factorial HMM approach

Abstract We present new statistical models for jointly labeling multiple sequences and apply them... more Abstract We present new statistical models for jointly labeling multiple sequences and apply them to the combined task of partof-speech tagging and noun phrase chunking. The model is based on the Factorial Hidden Markov Model (FHMM) with distributed hidden states representing partof-speech and noun phrase sequences. We demonstrate that this joint labeling approach, by enabling information sharing between tagging/chunking subtasks, outperforms the traditional method of tagging and chunking in succession.

Download

The NAIST Machine Translation System for IWSLT2012

Abstract This paper describes the NAIST statistical machine translation system for the IWSLT2012 ... more Abstract This paper describes the NAIST statistical machine translation system for the IWSLT2012 Evaluation Campaign. We participated in all TED Talk tasks, for a total of 11 languagepairs. For all tasks, we use the Moses phrase-based decoder and its experiment management system as a common base for building translation systems.

Download

Divide and translate: improving long distance reordering in statistical machine translation

Abstract This paper proposes a novel method for long distance, clause-level reordering in statist... more Abstract This paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause translations with non-terminals. The non-terminals are placeholders of embedded clauses, by which we reduce complicated clause-level reordering into simple word-level reordering.

Download