Skip to main navigation Skip to search Skip to main content

Improving Chinese to English SMT with multiple CWS results

  • Yongliang Ma*
  • , Tiejun Zhao
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In Chinese to English statistical machine translation (SMT), Chinese texts always need a pre-processing which segments sentences into words and this standard approach is Chinese word segmentation (CWS). However, CWS is not developed for SMT, its results are not necessarily optimal for SMT. In recent years, many investigations have been performed concerning making CWS suitable for SMT, but we explore it from another direction. In this paper, our basic idea is to use multiple CWS results as additional language knowledge sources and we present a simple and effective approach to use multiple CWS results for SMT. We also give experiment results over range of strategy settings, and obtain substantial improvements in performance for translation from Chinese to English. The best result shows we gain 1.89 BLEU percentage points over a state of the art HPBT baseline system without using multiple CWS results.

Original languageEnglish
Title of host publication2009 International Conference on Asian Language Processing
Subtitle of host publicationRecent Advances in Asian Language Processing, IALP 2009
Pages135-140
Number of pages6
DOIs
StatePublished - 2009
Event2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009 - Singapore, Singapore
Duration: 7 Dec 20099 Dec 2009

Publication series

Name2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009

Conference

Conference2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009
Country/TerritorySingapore
CitySingapore
Period7/12/099/12/09

Keywords

  • Chinese word segmentation
  • Feature blending
  • Feature interpolation
  • SMT

Fingerprint

Dive into the research topics of 'Improving Chinese to English SMT with multiple CWS results'. Together they form a unique fingerprint.

Cite this