Skip to main navigation Skip to search Skip to main content

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

  • Xuebo Liu
  • , Longyue Wang
  • , Derek F. Wong
  • , Liang Ding
  • , Lidia S. Chao
  • , Shuming Shi
  • , Zhaopeng Tu
  • University of Macau
  • Tencent
  • The University of Sydney

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT). This paper takes the first step to investigate the complementarity between PT and BT. We introduce two probing tasks for PT and BT respectively and find that PT mainly contributes to the encoder module while BT brings more benefits to the decoder. Experimental results show that PT and BT are nicely complementary to each other, establishing state-ofthe-art performances on the WMT16 EnglishRomanian and English-Russian benchmarks. Through extensive analyses on sentence originality and word frequency, we also demonstrate that combining Tagged BT with PT is more helpful to their complementarity, leading to better translation quality. Source code is freely available at https://github.com/ SunbowLiu/PTvsBT.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics, Findings of ACL
Subtitle of host publicationEMNLP 2021
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih
PublisherAssociation for Computational Linguistics (ACL)
Pages2900-2907
Number of pages8
ISBN (Electronic)9781955917100
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameFindings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

Conference

Conference2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period7/11/2111/11/21

Fingerprint

Dive into the research topics of 'On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation'. Together they form a unique fingerprint.

Cite this