Skip to main navigation Skip to search Skip to main content

Error detection for statistical machine translation using linguistic features

  • Deyi Xiong*
  • , Min Zhang
  • , Haizhou Li
  • *Corresponding author for this work
  • Agency for Science, Technology and Research, Singapore

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from N-best lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.

Original languageEnglish
Title of host publicationACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages604-611
Number of pages8
StatePublished - 2010
Externally publishedYes
Event48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden
Duration: 11 Jul 201016 Jul 2010

Publication series

NameACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Country/TerritorySweden
CityUppsala
Period11/07/1016/07/10

Fingerprint

Dive into the research topics of 'Error detection for statistical machine translation using linguistic features'. Together they form a unique fingerprint.

Cite this