2024
pdf
bib
abs
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
|
Bei Li
|
Huiwen Bao
|
Jiale Wang
|
Weiqiao Shan
|
Tong Xiao
|
JingBo Zhu
Findings of the Association for Computational Linguistics: ACL 2024
The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead. In this work, we emphasize the importance of hidden dimensions in designing lightweight FFNs, a factor often overlooked in previous architectures. Guided by this principle, we introduce PartialFormer, a parameter-efficient Transformer architecture utilizing multiple smaller FFNs to reduce parameters and computation while maintaining essential hidden dimensions. These smaller FFNs are integrated into a multi-head attention mechanism for effective collaboration. We also propose a tailored head scaling strategy to enhance PartialFormer’s capabilities. Furthermore, we present a residual-like attention calculation to improve depth scaling within PartialFormer. Extensive experiments on 9 translation tasks and 1 abstractive summarization task validate the effectiveness of our PartialFormer approach on machine translation and summarization tasks. Our code would be available at: https://github.com/zhengkid/PartialFormer.
pdf
bib
abs
HW-TSC 2024 Submission for the Quality Estimation Shared Task
Weiqiao Shan
|
Ming Zhu
|
Yuang Li
|
Mengyao Piao
|
Xiaofeng Zhao
|
Chang Su
|
Min Zhang
|
Hao Yang
|
Yanfei Jiang
Proceedings of the Ninth Conference on Machine Translation
Quality estimation (QE) is a crucial technique for evaluating the quality of machine translations without the need for reference translations. This paper focuses on Huawei Translation Services Center’s (HW-TSC’s) submission to the sentence-level QE shared task, named LLMs-enhanced-CrossQE. Our system builds upon the CrossQE architecture from our submission from last year, which consists of a multilingual base model and a task-specific downstream layer. The model input is a concatenation of the source and the translated sentences. To enhance performance, we fine-tuned and ensembled multiple base models, including XLM-R, InfoXLM, RemBERT, and CometKiwi. Specifically, we employed two pseudo-data generation methods: 1) a diverse pseudo-data generation method based on the corruption-based data augmentation technique introduced last year, and 2) a pseudo-data generation method that simulates machine translation errors using large language models (LLMs). Our results demonstrate that the system achieves outstanding performance on sentence-level QE test sets.
2022
pdf
bib
abs
The NiuTrans Machine Translation Systems for WMT22
Weiqiao Shan
|
Zhiquan Cao
|
Yuchen Han
|
Siming Wu
|
Yimin Hu
|
Jie Wang
|
Yi Zhang
|
Hou Baoyu
|
Hang Cao
|
Chenghao Gao
|
Xiaowen Liu
|
Tong Xiao
|
Anxiang Ma
|
Jingbo Zhu
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper describes the NiuTrans neural machine translation systems of the WMT22 General MT constrained task. We participate in four directions, including Chinese→English, English→Croatian, and Livonian↔English. Our models are based on several advanced Transformer variants, e.g., Transformer-ODE, Universal Multiscale Transformer (UMST). The main workflow consists of data filtering, large-scale data augmentation (i.e., iterative back-translation, iterative knowledge distillation), and specific-domain fine-tuning. Moreover, we try several multi-domain methods, such as a multi-domain model structure and a multi-domain data clustering method, to rise to this year’s newly proposed multi-domain test set challenge. For low-resource scenarios, we build a multi-language translation model to enhance the performance, and try to use the pre-trained language model (mBERT) to initialize the translation model.
2020
pdf
bib
abs
The NiuTrans Machine Translation Systems for WMT20
Yuhao Zhang
|
Ziyang Wang
|
Runzhe Cao
|
Binghao Wei
|
Weiqiao Shan
|
Shuhan Zhou
|
Abudurexiti Reheman
|
Tao Zhou
|
Xin Zeng
|
Laohu Wang
|
Yongyu Mu
|
Jingnan Zhang
|
Xiaoqian Liu
|
Xuanjun Zhou
|
Yinqiao Li
|
Bei Li
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the Fifth Conference on Machine Translation
This paper describes NiuTrans neural machine translation systems of the WMT20 news translation tasks. We participated in Japanese<->English, English->Chinese, Inuktitut->English and Tamil->English total five tasks and rank first in Japanese<->English both sides. We mainly utilized iterative back-translation, different depth and widen model architectures, iterative knowledge distillation and iterative fine-tuning. And we find that adequately widened and deepened the model simultaneously, the performance will significantly improve. Also, iterative fine-tuning strategy we implemented is effective during adapting domain. For Inuktitut->English and Tamil->English tasks, we built multilingual models separately and employed pretraining word embedding to obtain better performance.