Skip to main navigation Skip to search Skip to main content

SMT helps bitext dependency parsing

  • Wenliang Chen*
  • , Jun'ichi Kazama
  • , Min Zhang
  • , Yoshimasa Tsuruoka
  • , Yujie Zhang
  • , Yiou Wang
  • , Kentaro Torisawa
  • , Haizhou Li
  • *Corresponding author for this work
  • Agency for Science, Technology and Research, Singapore
  • Japan National Institute of Information and Communications Technology
  • Japan Advanced Institute of Science and Technology
  • Beijing Jiaotong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-the-art baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.

Original languageEnglish
Title of host publicationEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages73-83
Number of pages11
StatePublished - 2011
Externally publishedYes
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom
Duration: 27 Jul 201131 Jul 2011

Publication series

NameEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

ConferenceConference on Empirical Methods in Natural Language Processing, EMNLP 2011
Country/TerritoryUnited Kingdom
CityEdinburgh
Period27/07/1131/07/11

Fingerprint

Dive into the research topics of 'SMT helps bitext dependency parsing'. Together they form a unique fingerprint.

Cite this