Skip to main navigation Skip to search Skip to main content

A Closer Look at Transformer Attention for Multilingual Translation

  • Jingyi Zhang
  • , Hongfei Xu
  • , Kehai Chen
  • , Gerard de Melo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Transformers are the predominant model for machine translation. Recent studies also showed that a single Transformer model can be trained to learn translation for multiple different language pairs, achieving promising results. In this work, we investigate how multilingual Transformer models pay attention when translating different language pairs. To achieve this, we first conduct automatic pruning to eliminate a large number of noisy heads and then assess the functions and behaviors of the remaining heads in both self-attention and cross-attention. We find that different language pairs, in spite of having different syntax and word orders, tend to share the same heads for the same functions, such as syntax heads and reordering heads. However, the different characteristics of different language pairs can clearly cause interference in function heads and affect head accuracies. Additionally, we reveal an interesting behavior of the Transformer cross-attention: the deep-layer cross-attention heads work in a cooperative way to learn different options for word reordering, which may be caused by the nature of translation tasks having multiple different gold translations in the target language for the same source sentence.

Original languageEnglish
Title of host publicationProceedings of the 8th Conference on Machine Translation, WMT 2023
PublisherAssociation for Computational Linguistics
Pages494-504
Number of pages11
ISBN (Electronic)9798891760417
StatePublished - 2023
Externally publishedYes
Event8th Conference on Machine Translation, WMT 2023 - Singapore, Singapore
Duration: 6 Dec 20237 Dec 2023

Publication series

NameConference on Machine Translation - Proceedings
ISSN (Electronic)2768-0983

Conference

Conference8th Conference on Machine Translation, WMT 2023
Country/TerritorySingapore
CitySingapore
Period6/12/237/12/23

Fingerprint

Dive into the research topics of 'A Closer Look at Transformer Attention for Multilingual Translation'. Together they form a unique fingerprint.

Cite this