Skip to main navigation Skip to search Skip to main content

Two-stage hypotheses generation for spoken language translation

  • Boxing Chen*
  • , Min Zhang
  • , Ai Ti Aw
  • *Corresponding author for this work
  • Agency for Science, Technology and Research, Singapore

Research output: Contribution to journalArticlepeer-review

Abstract

Spoken Language Translation (SLT) is the research area that focuses on the translation of speech or text between two spoken languages. Phrase-based and syntax-based methods represent the state-of-the-art for statistical machine translation (SMT). The phrase-based method specializes in modeling local reorderings and translations of multiword expressions. The syntax-based method is enhanced by using syntactic knowledge, which can better model long word reorderings, discontinuous phrases, and syntactic structure. In this article, we leverage on the strength of these two methods and propose a strategy based on multiple hypotheses generation in a two-stage framework for spoken language translation. The hypotheses are generated in two stages, namely, decoding and regeneration. In the decoding stage, we apply state-of-the-art, phrase-based, and syntax-based methods to generate basic translation hypotheses. Then in the regeneration stage, much more hypotheses that cannot be captured by the decoding algorithms are produced from the basic hypotheses. We study three regeneration methods: redecoding, n-gram expansion, and confusion network in the second stage. Finally, an additional reranking pass is introduced to select the translation outputs by a linear combination of rescoring models. Experimental results on the Chinese-to-English IWSLT-2006 challenge task of translating the transcription of spontaneous speech show that the proposed mechanism achieves significant improvements over the baseline of about 2.80 BLEU-score.

Original languageEnglish
Article number4
JournalACM Transactions on Asian Language Information Processing
Volume8
Issue number1
DOIs
StatePublished - 1 Mar 2009
Externally publishedYes

Keywords

  • Hypotheses generation
  • Spoken language translation
  • Statistical machine translation

Fingerprint

Dive into the research topics of 'Two-stage hypotheses generation for spoken language translation'. Together they form a unique fingerprint.

Cite this