Skip to main navigation Skip to search Skip to main content

26Layer-wise de-Training and re-Training for ConvS2S machine translation

  • Hongfei Yu
  • , Xiaoqing Zhou
  • , Xiangyu Duan
  • , Min Zhang
  • Soochow University

Research output: Contribution to journalArticlepeer-review

Abstract

The convolutional sequence-To-sequence (ConvS2S) machine translation system is one of the typical neural machine translation (NMT) systems. Training the ConvS2S model tends to get stuck in a local optimum in our pre-studies. To overcome this inferior behavior, we propose to de-Train a trained ConvS2S model in a mild way and retrain to find a better solution globally. In particular, the trained parameters of one layer of the NMT network are abandoned by re-initialization while other layers' parameters are kept at the same time to kick off re-optimization from a new start point and safeguard the new start point not too far from the previous optimum. This procedure is executed layer by layer until all layers of the ConvS2S model are explored. Experiments show that when compared to various measures for escaping from the local optimum, including initialization with random seeds, adding perturbations to the baseline parameters, and continuing training (con-Training) with the baseline models, our method consistently improves the ConvS2S translation quality across various language pairs and achieves better performance.

Original languageEnglish
Article number3358414
JournalACM Transactions on Asian and Low-Resource Language Information Processing
Volume19
Issue number2
DOIs
StatePublished - Nov 2019
Externally publishedYes

Keywords

  • ConvS2S
  • Local optimum
  • Neural machine translation

Fingerprint

Dive into the research topics of '26Layer-wise de-Training and re-Training for ConvS2S machine translation'. Together they form a unique fingerprint.

Cite this