The Machine Translation course at Dublin City University is taught to undergraduate students in A... more The Machine Translation course at Dublin City University is taught to undergraduate students in Applied Computational Linguistics, while Computer-Assisted Translation is taught on two translator-training programmes, one undergraduate and one postgraduate. Given the differing backgrounds of these sets of students, the course material, methods of teaching and assessment all differ. We report here on our experiences of teaching these courses over a number of years, which we hope will be of interest to lecturers of similar existing courses, as well as providing a reference point for others who may be considering the introduction of such material.
(Way & Gough, 2005) demonstrate that their Marker-based EBMT system is capable of outperforming a... more (Way & Gough, 2005) demonstrate that their Marker-based EBMT system is capable of outperforming a word-based SMT system trained on reasonably large data sets. (Groves & Way, 2005) take this a stage further and demonstrate that while the EBMT system also outperforms a phrase-based SMT (PBSMT) system, a hybrid 'example-based SMT' system incorporating marker chunks and SMT sub-sentential alignments is capable of outperforming both baseline translation models for French{English translation. In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical EBMT' system capable of outperforming the baseline system of (Way & Gough, 2005). Using the Europarl (Koehn, 2005) training and test sets we show that this time around, although all 'hybrid' variants of the EBMT system fall short of the quality achieved by the baseline PBSMT system, merging elements of the marker-based and SMT data, as in (Groves & Way, 2005), to create a hybrid ...
Proceedings of the ACL Workshop on Building and Using Parallel Texts - ParaText '05, 2005
ABSTRACT (Way and Gough, 2005) provide an in-depth comparison of their Example-Based Machine Tran... more ABSTRACT (Way and Gough, 2005) provide an in-depth comparison of their Example-Based Machine Translation (EBMT) system with a Statistical Machine Translation (SMT) system constructed from freely available tools. According to a wide variety of automatic evaluation metrics, they demonstrated that their EBMT system outperformed the SMT system by a factor of two to one. Nevertheless, they did not test their EBMT system against a phrase-based SMT system. Obtaining their training and test data for English--French, we carry out a number of experiments using the Pharaoh SMT Decoder. While better results are seen when Pharaoh is seeded with Giza++ word- and phrase-based data compared to EBMT sub-sentential alignments, in general better results are obtained when combinations of this 'hybrid' data is used to construct the translation and probability models. While for the most part the EBMT system of (Gough & Way, 2004b) outperforms any flavour of the phrase-based SMT systems constructed in our experiments, combining the data sets automatically induced by both Giza++ and their EBMT system leads to a hybrid system which improves on the EBMT system per se for French--English.
ABSTRACT One key to the success of EBMT is the removal of the boundaries limiting the potential o... more ABSTRACT One key to the success of EBMT is the removal of the boundaries limiting the potential of Translation Memories (TMs). We discuss a linguistically enhanced TM system, a Phrasal Lexicon (PL), which takes advantage of the huge, underused resources available in existing translation aids. We claim that PL and EBMT systems can provide valuable translation solutions for restricted domains, especially where controlled language restrictions are imposed. When integrated into a hybrid and/or multi-engine MT environment, the PL will yield significant improvements in translation quality. We establish a future model of translation usage and anticipate that EBMT and the PL will have a central place in future hybrid integrated translation platforms.
The Machine Translation course at Dublin City University is taught to undergraduate students in A... more The Machine Translation course at Dublin City University is taught to undergraduate students in Applied Computational Linguistics, while Computer-Assisted Translation is taught on two translator-training programmes, one undergraduate and one postgraduate. Given the differing backgrounds of these sets of students, the course material, methods of teaching and assessment all differ. We report here on our experiences of teaching these courses over a number of years, which we hope will be of interest to lecturers of similar existing courses, as well as providing a reference point for others who may be considering the introduction of such material.
(Way & Gough, 2005) demonstrate that their Marker-based EBMT system is capable of outperforming a... more (Way & Gough, 2005) demonstrate that their Marker-based EBMT system is capable of outperforming a word-based SMT system trained on reasonably large data sets. (Groves & Way, 2005) take this a stage further and demonstrate that while the EBMT system also outperforms a phrase-based SMT (PBSMT) system, a hybrid 'example-based SMT' system incorporating marker chunks and SMT sub-sentential alignments is capable of outperforming both baseline translation models for French{English translation. In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical EBMT' system capable of outperforming the baseline system of (Way & Gough, 2005). Using the Europarl (Koehn, 2005) training and test sets we show that this time around, although all 'hybrid' variants of the EBMT system fall short of the quality achieved by the baseline PBSMT system, merging elements of the marker-based and SMT data, as in (Groves & Way, 2005), to create a hybrid ...
Proceedings of the ACL Workshop on Building and Using Parallel Texts - ParaText '05, 2005
ABSTRACT (Way and Gough, 2005) provide an in-depth comparison of their Example-Based Machine Tran... more ABSTRACT (Way and Gough, 2005) provide an in-depth comparison of their Example-Based Machine Translation (EBMT) system with a Statistical Machine Translation (SMT) system constructed from freely available tools. According to a wide variety of automatic evaluation metrics, they demonstrated that their EBMT system outperformed the SMT system by a factor of two to one. Nevertheless, they did not test their EBMT system against a phrase-based SMT system. Obtaining their training and test data for English--French, we carry out a number of experiments using the Pharaoh SMT Decoder. While better results are seen when Pharaoh is seeded with Giza++ word- and phrase-based data compared to EBMT sub-sentential alignments, in general better results are obtained when combinations of this 'hybrid' data is used to construct the translation and probability models. While for the most part the EBMT system of (Gough & Way, 2004b) outperforms any flavour of the phrase-based SMT systems constructed in our experiments, combining the data sets automatically induced by both Giza++ and their EBMT system leads to a hybrid system which improves on the EBMT system per se for French--English.
ABSTRACT One key to the success of EBMT is the removal of the boundaries limiting the potential o... more ABSTRACT One key to the success of EBMT is the removal of the boundaries limiting the potential of Translation Memories (TMs). We discuss a linguistically enhanced TM system, a Phrasal Lexicon (PL), which takes advantage of the huge, underused resources available in existing translation aids. We claim that PL and EBMT systems can provide valuable translation solutions for restricted domains, especially where controlled language restrictions are imposed. When integrated into a hybrid and/or multi-engine MT environment, the PL will yield significant improvements in translation quality. We establish a future model of translation usage and anticipate that EBMT and the PL will have a central place in future hybrid integrated translation platforms.
Uploads