Abstract
The convolutional sequence-To-sequence (ConvS2S) machine translation system is one of the typical neural machine translation (NMT) systems. Training the ConvS2S model tends to get stuck in a local optimum in our pre-studies. To overcome this inferior behavior, we propose to de-Train a trained ConvS2S model in a mild way and retrain to find a better solution globally. In particular, the trained parameters of one layer of the NMT network are abandoned by re-initialization while other layers' parameters are kept at the same time to kick off re-optimization from a new start point and safeguard the new start point not too far from the previous optimum. This procedure is executed layer by layer until all layers of the ConvS2S model are explored. Experiments show that when compared to various measures for escaping from the local optimum, including initialization with random seeds, adding perturbations to the baseline parameters, and continuing training (con-Training) with the baseline models, our method consistently improves the ConvS2S translation quality across various language pairs and achieves better performance.
| Original language | English |
|---|---|
| Article number | 3358414 |
| Journal | ACM Transactions on Asian and Low-Resource Language Information Processing |
| Volume | 19 |
| Issue number | 2 |
| DOIs | |
| State | Published - Nov 2019 |
| Externally published | Yes |
Keywords
- ConvS2S
- Local optimum
- Neural machine translation
Fingerprint
Dive into the research topics of '26Layer-wise de-Training and re-Training for ConvS2S machine translation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver