In the field of staistical machine translation (SMT), pre-ordering is a recently attractive appro... more In the field of staistical machine translation (SMT), pre-ordering is a recently attractive approach that reorders source language words into the target language order prior to SMT decoding. It is effective for long-distance reordering in SMT, especially between languages with distant word ordering like English and Japanese. Its key idea is to decompose the SMT problem into two subproblems of translation and reordering and to solve them independently. However, most preordering approaches employ syntactic ...
Proceedings of the International Workshop on Spoken Language Translation (IWSLT)-Technical Papers Track, 2010
Numerous empirical results have shown that combining data from multiple domains often improve sta... more Numerous empirical results have shown that combining data from multiple domains often improve statistical machine translation (SMT) performance. For example, if we desire to build SMT for the medical domain, it may be beneficial to augment the training data with bitext from another domain, such as parliamentary proceedings. Despite the positive results, it is not clear exactly how and where additional outof-domain data helps in the SMT training pipeline. In this work, we analyze this problem in detail, considering the following ...
This paper describes details of the NTT-NII system in NTCIR10 PatentMT task. The system is an ext... more This paper describes details of the NTT-NII system in NTCIR10 PatentMT task. The system is an extension of the NTTUT system in NTCIR-9 by: a new English dependency parser (for EJ task), a syntactic rule-based pre-ordering (for JE task), a syntax-based post-ordering (for JE task). Our system ranked 1st in EJ subtask both in automatic and subjective evaluation, and was the best SMT system in JE subtask.
This paper reports the NTT statistical translation system participating in the evaluation campaig... more This paper reports the NTT statistical translation system participating in the evaluation campaign of IWSLT 2005. The NTT system is based on a phrase translation model and utilizes a large number of features with a log-linear model. We studied the various features recently developed in this research field and evaluate the system using supplied data as well as publicly available Chinese, Japanese, and English data. Despite domain mismatch, additional data helped improve translation accuracy. 1.
Annual Conference of the International Speech Communication Association, 2000
We describe the implementation of a cellular-phone based speech translation system without teleph... more We describe the implementation of a cellular-phone based speech translation system without telephone quality speech database or special CT hardware. The purpose is to quickly build a prototype service system that can be used for data collection with real users. To train the acoustic model for the speech recognition system, available high-quality databases were made usable by 1.) appropriate downsampling and ltering of high-quality databases, and 2.) by piping, similar to the NTIMIT and CTIMIT paradigms. An evaluation of acoustic models with ltered, piped and real cellular-phone data is given. Recognition rates are at same levels as for wideband speech.
We propose a new framework for N-best reranking on sparse feature sets. The idea is to reformulat... more We propose a new framework for N-best reranking on sparse feature sets. The idea is to reformulate the reranking problem as a Multitask Learning problem, where each N-best list corresponds to a distinct task. This is motivated by the observation that N-best lists often show significant differ-ences in feature distributions. Training a single reranker directly on this heteroge-nous data can be difficult. Our proposed meta-algorithm solves this challenge by using multitask learning (such as ℓ1/ℓ2 regularization) to discover common feature representations across N-best lists. This meta-algorithm is simple to implement, and its modular approach al-lows one to plug-in different learning algo-rithms from existing literature. As a proof of concept, we show statistically signifi-cant improvements on a machine transla-tion system involving millions of features. 1
This paper describes NTT’s Japanese-English Cross-Language Question Answering System SAIQA-J/E. T... more This paper describes NTT’s Japanese-English Cross-Language Question Answering System SAIQA-J/E. The system performed best among eight systems that participated in the Japanese-English subtask of the NTCIR Cross-Language Question Answering task. For cross-language document retrieval, we used a dictionary-based approach without word sense disambiguation. We used a synonym operator to represent translation alternatives. Our experiments show that the synonym operator improved the retrieval precision of our proximity-based document retrieval module. We also developed a web-based back-transliteration submodule for unknown katakana words.
Hierarchical phrase-based machine trans-lation can capture global reordering with synchronous con... more Hierarchical phrase-based machine trans-lation can capture global reordering with synchronous context-free grammar, but has little ability to evaluate the correctness of word orderings during decoding. We propose a method to integrate word-based reordering model into hierarchical phrase-based machine translation to overcome this weakness. Our approach extends the synchronous context-free grammar rules of hierarchical phrase-based model to in-clude reordered source strings, allowing efficient calculation of reordering model scores during decoding. Our experimen-tal results on Japanese-to-English basic travel expression corpus showed that the BLEU scores obtained by our proposed system were better than those obtained by a standard hierarchical phrase-based ma-chine translation system. 1
We introduce an approach to optimize a machine translation (MT) system on multiple metrics simult... more We introduce an approach to optimize a machine translation (MT) system on multiple metrics simultaneously. Different metrics (e.g. BLEU, TER) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality. Our approach is based on the theory of Pareto Optimality. It is simple to implement on top of existing single-objective optimization methods (e.g. MERT, PRO) and outperforms ad hoc alternatives based on linear-combination of metrics. We also discuss the issue of metric tunability and show that our Pareto approach is more effective in incorporating new metrics from MT evaluation for MT optimization. 1
We achieved a state of the art performance in statistical machine translation by using a large nu... more We achieved a state of the art performance in statistical machine translation by using a large number of features with an online large-margin training algorithm. The millions of parameters were tuned only on a small development set consisting of less than 1K sentences. Experiments on Arabic-to-English translation indicated that a model trained with sparse binary features outperformed a conventional SMT system with a small number of features. 1
In Statistical Machine Translation, reordering rules have proved useful in extracting bilingual p... more In Statistical Machine Translation, reordering rules have proved useful in extracting bilingual phrases and in decoding during translation between languages that are structurally different. Linguistically motivated rules have been incorporated into Chineseto-English
This paper reports the NTT statistical translation system participating in the evaluation campaig... more This paper reports the NTT statistical translation system participating in the evaluation campaign of IWSLT 2005. The NTT system is based on a phrase translation model and utilizes a large number of features with a log-linear model. We studied the various features recently developed in this research field and evaluate the system using supplied data as well as publicly available Chinese, Japanese, and English data. Despite domain mismatch, additional data helped improve translation accuracy.
Minimum Bayes Risk (MBR) has been used as a decision rule for both singlesystem decoding and syst... more Minimum Bayes Risk (MBR) has been used as a decision rule for both singlesystem decoding and system combination in machine translation. For system combination, we argue that common MBR implementations are actually not correct, since probabilities in the hypothesis space cannot be reliably estimated. These implementations achieve the effect of consensus decoding (which may be beneficial in its own right), but does not reduce Bayes Risk in the true Bayesian sense. We introduce Generalized MBR, which parameterizes the loss function in MBR and allows it to be optimized in the given hypothesis space of multiple systems. This extension better approximates the true Bayes Risk decision rule and empirically improves over MBR, even in cases where the combined systems are of mixed quality.
This paper discusses an integrated spoken language understanding method using a statistical trans... more This paper discusses an integrated spoken language understanding method using a statistical translation model from words to semantic concepts. The translation model is an N-gram-based model that can easily be integrated with speech recognition. It can be trained using annotated corpora where only sentencelevel alignments between word sequences and concept sets are available, by automatic alignment based on cooccurrence between words and concepts. It can reduce the effort for explicitly aligning words to the corresponding concept. The method determines the confidence of understanding hypotheses for rejection in a similar manner to word-posterior-based confidence scoring in speech recognition. Experimental results show the advantages of integration over a cascaded method of speech recognition and word-to-concept translation in spoken language understanding with confidence-based rejection.
We propose aflexible and efficientdomain adaptation method that yields consistent improvements in... more We propose aflexible and efficientdomain adaptation method that yields consistent improvements in machine translation (for 11 language pairs). The idea is to decompose the word alignment process into two steps, model training and alignment inference, and perform Bayesian adaptation on the latter. This modularity allows one to incorporate out-of-domain data without the need to modify existing training algorithms. We show how ideas in sequential Bayesian methods can be naturally applied to the word alignment problem and demonstrate various positive results on EMEA and NIST datasets.
In Statistical Machine Translation(SMT), reordering rules have been proved effective in extractin... more In Statistical Machine Translation(SMT), reordering rules have been proved effective in extracting bilingual phrases and in decoding when translating between languages whose word orders are structurally different. Researchers have tackled the reordering problem in multiple ways. One basic idea is preordering (Xia and McCord, 2004; Collins et al., 2005), that is, to pre-order the source sentences following the word order of the target sentences to be used for decoding. For example, making use of a source dependency parser, Xu et al. (2009) manually created dependency-to-string pre-ordering rules for translating English into five SOV(Subject-ObjectVerb) languages. Later, dependency tree based preordering rules were automatically extracted by Genzel (2010) from word-aligned parallel sentences. In this work, we focus on Chinese-to-Japanese translation, motivated by the need of constructing a direct machine translation system without using a pivot language. Chinese and Japanese involve s...
This paper presents a comparative study of target dependency structures yielded by several state-... more This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these non-isomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser's PASs achieved the best dependency and translation accuracies.
In the field of staistical machine translation (SMT), pre-ordering is a recently attractive appro... more In the field of staistical machine translation (SMT), pre-ordering is a recently attractive approach that reorders source language words into the target language order prior to SMT decoding. It is effective for long-distance reordering in SMT, especially between languages with distant word ordering like English and Japanese. Its key idea is to decompose the SMT problem into two subproblems of translation and reordering and to solve them independently. However, most preordering approaches employ syntactic ...
Proceedings of the International Workshop on Spoken Language Translation (IWSLT)-Technical Papers Track, 2010
Numerous empirical results have shown that combining data from multiple domains often improve sta... more Numerous empirical results have shown that combining data from multiple domains often improve statistical machine translation (SMT) performance. For example, if we desire to build SMT for the medical domain, it may be beneficial to augment the training data with bitext from another domain, such as parliamentary proceedings. Despite the positive results, it is not clear exactly how and where additional outof-domain data helps in the SMT training pipeline. In this work, we analyze this problem in detail, considering the following ...
This paper describes details of the NTT-NII system in NTCIR10 PatentMT task. The system is an ext... more This paper describes details of the NTT-NII system in NTCIR10 PatentMT task. The system is an extension of the NTTUT system in NTCIR-9 by: a new English dependency parser (for EJ task), a syntactic rule-based pre-ordering (for JE task), a syntax-based post-ordering (for JE task). Our system ranked 1st in EJ subtask both in automatic and subjective evaluation, and was the best SMT system in JE subtask.
This paper reports the NTT statistical translation system participating in the evaluation campaig... more This paper reports the NTT statistical translation system participating in the evaluation campaign of IWSLT 2005. The NTT system is based on a phrase translation model and utilizes a large number of features with a log-linear model. We studied the various features recently developed in this research field and evaluate the system using supplied data as well as publicly available Chinese, Japanese, and English data. Despite domain mismatch, additional data helped improve translation accuracy. 1.
Annual Conference of the International Speech Communication Association, 2000
We describe the implementation of a cellular-phone based speech translation system without teleph... more We describe the implementation of a cellular-phone based speech translation system without telephone quality speech database or special CT hardware. The purpose is to quickly build a prototype service system that can be used for data collection with real users. To train the acoustic model for the speech recognition system, available high-quality databases were made usable by 1.) appropriate downsampling and ltering of high-quality databases, and 2.) by piping, similar to the NTIMIT and CTIMIT paradigms. An evaluation of acoustic models with ltered, piped and real cellular-phone data is given. Recognition rates are at same levels as for wideband speech.
We propose a new framework for N-best reranking on sparse feature sets. The idea is to reformulat... more We propose a new framework for N-best reranking on sparse feature sets. The idea is to reformulate the reranking problem as a Multitask Learning problem, where each N-best list corresponds to a distinct task. This is motivated by the observation that N-best lists often show significant differ-ences in feature distributions. Training a single reranker directly on this heteroge-nous data can be difficult. Our proposed meta-algorithm solves this challenge by using multitask learning (such as ℓ1/ℓ2 regularization) to discover common feature representations across N-best lists. This meta-algorithm is simple to implement, and its modular approach al-lows one to plug-in different learning algo-rithms from existing literature. As a proof of concept, we show statistically signifi-cant improvements on a machine transla-tion system involving millions of features. 1
This paper describes NTT’s Japanese-English Cross-Language Question Answering System SAIQA-J/E. T... more This paper describes NTT’s Japanese-English Cross-Language Question Answering System SAIQA-J/E. The system performed best among eight systems that participated in the Japanese-English subtask of the NTCIR Cross-Language Question Answering task. For cross-language document retrieval, we used a dictionary-based approach without word sense disambiguation. We used a synonym operator to represent translation alternatives. Our experiments show that the synonym operator improved the retrieval precision of our proximity-based document retrieval module. We also developed a web-based back-transliteration submodule for unknown katakana words.
Hierarchical phrase-based machine trans-lation can capture global reordering with synchronous con... more Hierarchical phrase-based machine trans-lation can capture global reordering with synchronous context-free grammar, but has little ability to evaluate the correctness of word orderings during decoding. We propose a method to integrate word-based reordering model into hierarchical phrase-based machine translation to overcome this weakness. Our approach extends the synchronous context-free grammar rules of hierarchical phrase-based model to in-clude reordered source strings, allowing efficient calculation of reordering model scores during decoding. Our experimen-tal results on Japanese-to-English basic travel expression corpus showed that the BLEU scores obtained by our proposed system were better than those obtained by a standard hierarchical phrase-based ma-chine translation system. 1
We introduce an approach to optimize a machine translation (MT) system on multiple metrics simult... more We introduce an approach to optimize a machine translation (MT) system on multiple metrics simultaneously. Different metrics (e.g. BLEU, TER) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality. Our approach is based on the theory of Pareto Optimality. It is simple to implement on top of existing single-objective optimization methods (e.g. MERT, PRO) and outperforms ad hoc alternatives based on linear-combination of metrics. We also discuss the issue of metric tunability and show that our Pareto approach is more effective in incorporating new metrics from MT evaluation for MT optimization. 1
We achieved a state of the art performance in statistical machine translation by using a large nu... more We achieved a state of the art performance in statistical machine translation by using a large number of features with an online large-margin training algorithm. The millions of parameters were tuned only on a small development set consisting of less than 1K sentences. Experiments on Arabic-to-English translation indicated that a model trained with sparse binary features outperformed a conventional SMT system with a small number of features. 1
In Statistical Machine Translation, reordering rules have proved useful in extracting bilingual p... more In Statistical Machine Translation, reordering rules have proved useful in extracting bilingual phrases and in decoding during translation between languages that are structurally different. Linguistically motivated rules have been incorporated into Chineseto-English
This paper reports the NTT statistical translation system participating in the evaluation campaig... more This paper reports the NTT statistical translation system participating in the evaluation campaign of IWSLT 2005. The NTT system is based on a phrase translation model and utilizes a large number of features with a log-linear model. We studied the various features recently developed in this research field and evaluate the system using supplied data as well as publicly available Chinese, Japanese, and English data. Despite domain mismatch, additional data helped improve translation accuracy.
Minimum Bayes Risk (MBR) has been used as a decision rule for both singlesystem decoding and syst... more Minimum Bayes Risk (MBR) has been used as a decision rule for both singlesystem decoding and system combination in machine translation. For system combination, we argue that common MBR implementations are actually not correct, since probabilities in the hypothesis space cannot be reliably estimated. These implementations achieve the effect of consensus decoding (which may be beneficial in its own right), but does not reduce Bayes Risk in the true Bayesian sense. We introduce Generalized MBR, which parameterizes the loss function in MBR and allows it to be optimized in the given hypothesis space of multiple systems. This extension better approximates the true Bayes Risk decision rule and empirically improves over MBR, even in cases where the combined systems are of mixed quality.
This paper discusses an integrated spoken language understanding method using a statistical trans... more This paper discusses an integrated spoken language understanding method using a statistical translation model from words to semantic concepts. The translation model is an N-gram-based model that can easily be integrated with speech recognition. It can be trained using annotated corpora where only sentencelevel alignments between word sequences and concept sets are available, by automatic alignment based on cooccurrence between words and concepts. It can reduce the effort for explicitly aligning words to the corresponding concept. The method determines the confidence of understanding hypotheses for rejection in a similar manner to word-posterior-based confidence scoring in speech recognition. Experimental results show the advantages of integration over a cascaded method of speech recognition and word-to-concept translation in spoken language understanding with confidence-based rejection.
We propose aflexible and efficientdomain adaptation method that yields consistent improvements in... more We propose aflexible and efficientdomain adaptation method that yields consistent improvements in machine translation (for 11 language pairs). The idea is to decompose the word alignment process into two steps, model training and alignment inference, and perform Bayesian adaptation on the latter. This modularity allows one to incorporate out-of-domain data without the need to modify existing training algorithms. We show how ideas in sequential Bayesian methods can be naturally applied to the word alignment problem and demonstrate various positive results on EMEA and NIST datasets.
In Statistical Machine Translation(SMT), reordering rules have been proved effective in extractin... more In Statistical Machine Translation(SMT), reordering rules have been proved effective in extracting bilingual phrases and in decoding when translating between languages whose word orders are structurally different. Researchers have tackled the reordering problem in multiple ways. One basic idea is preordering (Xia and McCord, 2004; Collins et al., 2005), that is, to pre-order the source sentences following the word order of the target sentences to be used for decoding. For example, making use of a source dependency parser, Xu et al. (2009) manually created dependency-to-string pre-ordering rules for translating English into five SOV(Subject-ObjectVerb) languages. Later, dependency tree based preordering rules were automatically extracted by Genzel (2010) from word-aligned parallel sentences. In this work, we focus on Chinese-to-Japanese translation, motivated by the need of constructing a direct machine translation system without using a pivot language. Chinese and Japanese involve s...
This paper presents a comparative study of target dependency structures yielded by several state-... more This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these non-isomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser's PASs achieved the best dependency and translation accuracies.
Uploads