Abstract
There are various ways to incorporate syntax knowledge into neural machine translation (NMT). However, quantifying the dependency syntactic intimacy (DSI) between word pairs in a dependency tree has not being considered to use in attentional and transformer-based NMT. In this paper, we innovatively propose a variant of Tree-LSTM to capture the syntactic dependency degree (SDD) between word pairs in dependency trees. Two syntax-aware distances, including a tuned syntax distance and a \(\varvec{\rho }\)-dependent distance, are proposed. For attentional NMT, two syntax-aware attentions based on two syntax-aware distances are proposed for attentional NMT, and we also design a dual attention to simultaneously generate global context and dependency syntactic context. For transformer-based NMT, we explicitly incorporate the dependency syntax into self-attention network (SAN) to propose a syntax-aware SAN. Experiments on IWSLT’17 English–German, IWSLT Chinese–English and WMT’15 English–Finnish translation tasks show that our syntax-aware NMT significantly improves translation quality by comparing with baseline methods, even the state-of-the-art transformer-based NMT.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Since there was no training set of suitable size, we combined Chinese–English bilingual data of IWSLT2012, 2013, 2014, 2015, 2017, and cleaned duplicate sentences.
References
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proc. Int. Conf. Learn. Represent
Calixto I, Liu Q, Campbell N (2017) Doubly-attentive decoder for multi-modal neural machine translation. In: Proc. 55th Annu. Meet. Assoc. Comput. Linguist., pp. 1913–1924
Cettolo M, Niehues J, Stüker S, Bentivogli L, Cattoni R, Federico M (2015) The iwslt 2015 evaluation campaign. In: Proc. 12th Int. Workshop Spoken Lang. Trans
Cettolo M, Niehues J, Stüker S, Bentivogli L, Federico M (2013) Report on the 10th iwslt evaluation campaign. In: Proc. 10th Int. Workshop Spoken Lang. Trans., pp. 29–38
Cettolo M, Niehues J, Stüker S, Bentivogli L, Federico M (2014) Report on the 11th iwslt evaluation campaign, iwslt 2014. In: Proc. 11th Int. Workshop Spoken Lang. Trans., vol. 57
Chen K, Wang R, Utiyama M, Liu L, Tamura A, Sumita E, Zhao T (2017) Neural machine translation with source dependency representation. In: Proc. Conf. Empir. Methods Nat. Lang. Process., pp. 2846–2852
Chen K, Wang R, Utiyama M, Sumita E, Zhao T (2018) Syntax-directed attention for neural machine translation. In: Proc. 32nd AAAI Conf. Artif. Intell., pp. 4792–4799
Eriguchi A, Hashimoto K, Tsuruoka Y (2016) Tree-to-sequence attentional neural machine translation. In: Proc. 54th Annu. Meet. Assoc. Comput. Linguist., pp. 823–833
Eriguchi A, Tsuruoka Y, Cho K (2017) Learning to parse and translate improves neural machine translation. In: Proc. 55th Annu. Meet. Assoc. Comput. Linguist., pp. 72–78
Gū J, Shavarani HS, Sarkar A (2018) Top-down tree structured decoding with syntactic connections for neural machine translation and parsing. In: Proc. Conf. Empir. Methods Nat. Lang. Process., pp. 401–413
Hashimoto K, Tsuruoka Y (2017) Neural machine translation with source-side latent graph parsing. In: Proc. Conf. Empir. Methods Nat. Lang. Process., pp. 125–135
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580
Hudson R (1995) Measuring syntactic difficulty. Manuscript
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proc. Conf. Empir. Methods Nat. Lang. Process., pp. 388–395
Li J, Xiong D, Tu Z, Zhu M, Zhang M, Zhou G (2017) Modeling source syntax for neural machine translation. In: Proc. 55th Annu. Meet. Assoc. Comput. Linguist., pp. 688–697
Liu H (2007) Dependency relations and dependency distance: a statistical view based on treebank. In: Proc. Int. Conf. Mean. Text. Theory., pp. 269–278
Liu H (2008) Dependency distance as a metric of language comprehension difficulty. J Cogn Sc 9(2):159–191
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proc. Conf. Empir. Methods Nat. Lang. Process., pp. 1412–1421
Nicenboim B, Vasishth S, Gattei C, Sigman M, Kliegl R (2015) Working memory differences in long-distance dependency resolution. Front Psychol 6:312–328
Nivre J, De Marneffe MC, Ginter F, Goldberg Y, Hajic J, Manning CD, McDonald R, Petrov S, Pyysalo S, Silveira N, et al. (2016) Universal dependencies v1: a multilingual treebank collection. In: Proc. 10th Int. Conf. Lang. Resour. Eval., pp. 1659–1666
Oya M (2011) Syntactic dependency distance as sentence complexity measure. In: Proc. 16th Int. Conf. Pan Pac. Assoc. Appl. Linguist., pp. 313–316
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proc. 40th Annu. Meet. Assoc. Comput. Linguist., pp. 311–318
Peng R, Chen Z, Hao T, Fang Y (2019) Neural machine translation with attention based on a new syntactic branch distance. In: Proc. 15th China. Conf. Mach. Trans., pp. 47–57
Sennrich R, Haddow B (2016) Linguistic input features improve neural machine translation. In: Proc. 1st Conf. Mach. Trans., pp. 83–91
Sethuraman J (1994) A constructive definition of dirichlet priors. Stat Sin 4(2):639–650
Shen Y, Lin Z, wei Huang C, Courville A (2018) Neural language modeling by jointly learning syntax and lexicon. In: Proc. 6th Int. Conf. Learn. Represent
Shen Y, Lin Z, Jacob AP, Sordoni A, Courville A, Bengio Y (2018) Straight to the tree: constituency parsing with neural syntactic distance. In: Proc. 56th Annu. Meet. Assoc. Comput. Linguist., pp. 1171–1180
Shi L, Niu C, Zhou M, Gao J (2006) A DOM tree alignment model for mining parallel data from the web. In: Proc. 21st Int. Conf. Comput. Linguist. 44th Annu. Meet. Assoc. Comput. Linguist., pp. 489–496
Steele D, Sim Smith K, Specia L (2015) Sheffield systems for the Finnish-English WMT translation task. In: Proc. Conf. Empir. Methods Nat. Lang. Process., pp. 172–176
Su J, Chen J, Jiang H, Zhou C, Lin H, Ge Y, Wu Q, Lai Y (2021) Multi-modal neural machine translation with deep semantic interactions. Inf Sci 554:47–60
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proc. 53rd Annu. Meet. Assoc. Comput. Linguist. 7th Int. Jt. Conf. Nat. Lang. Process. Asian Fed. Nat. Lang. Process., pp. 1556–1566
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Adv. neural inf. proces. syst., pp. 5998–6008
Wu S, Zhang D, Zhang Z, Yang N, Li M, Zhou M (2018) Dependency-to-dependency neural machine translation. IEEE-ACM T AUDIO SPE 26(11):2132–2141
Wu Y, Schuster M, Chen Z, Le Q, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser u, Gouws S, Kato Y, Kudo T, Kazawa H, Dean J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144, 2016
Yang B, Tu Z, Wong DF, Meng F, Chao LS, Zhang T (2018) Modeling localness for self-attention networks. In: Proc. Conf. Empir. Methods Nat. Lang. Process., pp. 4449–4458
Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grants 61772146. The author would like to thank Biao Zhang from the University of Edinburgh for his assistance.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors confirm that this paper content has no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Peng, R., Hao, T. & Fang, Y. Syntax-aware neural machine translation directed by syntactic dependency degree. Neural Comput & Applic 33, 16609–16625 (2021). https://doi.org/10.1007/s00521-021-06256-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06256-4