Abstract
Most of the deep neural machine translation (NMT) models are based on a bottom-up feedforward fashion, in which representations in low layers construct or modulate high layers representations. We conjecture that this unidirectional encoding fashion could be a potential issue in building a deep NMT model. In this paper, we propose to build a deeper Transformer encoder by properly organizing encoder layers into multiple groups, which are connected via a grouping skip connection mechanism. Here, each group is further appropriately fed into subsequent groups to build a deep Transformer encoder. In this way, we successfully build a deep Transformer encoder with up to 48 layers. Moreover, we can share the parameters among groups to extend the encoder (virtual) depth even without introducing additional parameters. Detailed experimentation on the large-scale WMT (workshop on machine translation) 2014 English-to-German, English-to-French translation, WMT 2016 English-to-German, and WMT 2017 Chinese-to-English tasks demonstrates that our proposed deep Transformer model significantly outperforms the strong Transformer baseline. Furthermore, we carry out linguistic probing tasks to analyze the problems existing in the original Transformer model and explain how our deep Transformer encoder improves the translation quality. One particularly nice property of our approach is that it is incredibly easy to implement. We make our code available on Github https://github.com/liyc7711/deep-nmt.
| Original language | English |
|---|---|
| Article number | 107556 |
| Journal | Knowledge-Based Systems |
| Volume | 234 |
| DOIs | |
| State | Published - 25 Dec 2021 |
| Externally published | Yes |
Keywords
- Deep NMT
- Grouping skip connection
- Neural machine translation
- Transformer
Fingerprint
Dive into the research topics of 'Deep Transformer modeling via grouping skip connection for neural machine translation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver