Skip to main navigation Skip to search Skip to main content

Third-Party Aligner for Neural Word Alignments

  • Jinpeng Zhang
  • , Chuanqi Dong
  • , Xiangyu Duan*
  • , Yuqi Zhang
  • , Min Zhang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Word alignment is to find translationally equivalent words between source and target sentences. Previous work has demonstrated that self-training can achieve competitive word alignment results. In this paper, we propose to use word alignments generated by a third-party word aligner to supervise the neural word alignment training. Specifically, source word and target word of each word pair aligned by the third-party aligner are trained to be close neighbors to each other in the contextualized embedding space when fine-tuning a pre-trained cross-lingual language model. Experiments on the benchmarks of various language pairs show that our approach can surprisingly do self-correction over the third-party supervision by finding more accurate word alignments and deleting wrong word alignments, leading to better performance than various third-party word aligners, including the currently best one. When we integrate all supervisions from various third-party aligners, we achieve state-of-the-art word alignment performances, with averagely more than two points lower alignment error rates than the best third-party aligner.We released our code at https://github.com/sdongchuanqi/Third-Party-Supervised-Aligner.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2022
EditorsYoav Goldberg, Zornitsa Kozareva, Yue Zhang
PublisherAssociation for Computational Linguistics (ACL)
Pages3134-3145
Number of pages12
ISBN (Electronic)9781959429432
DOIs
StatePublished - 2022
Externally publishedYes
Event2022 Findings of the Association for Computational Linguistics: EMNLP 2022 - Hybrid, Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022

Publication series

NameFindings of the Association for Computational Linguistics: EMNLP 2022

Conference

Conference2022 Findings of the Association for Computational Linguistics: EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityHybrid, Abu Dhabi
Period7/12/2211/12/22

Fingerprint

Dive into the research topics of 'Third-Party Aligner for Neural Word Alignments'. Together they form a unique fingerprint.

Cite this