TY - GEN
T1 - Improving Chinese to English SMT with multiple CWS results
AU - Ma, Yongliang
AU - Zhao, Tiejun
PY - 2009
Y1 - 2009
N2 - In Chinese to English statistical machine translation (SMT), Chinese texts always need a pre-processing which segments sentences into words and this standard approach is Chinese word segmentation (CWS). However, CWS is not developed for SMT, its results are not necessarily optimal for SMT. In recent years, many investigations have been performed concerning making CWS suitable for SMT, but we explore it from another direction. In this paper, our basic idea is to use multiple CWS results as additional language knowledge sources and we present a simple and effective approach to use multiple CWS results for SMT. We also give experiment results over range of strategy settings, and obtain substantial improvements in performance for translation from Chinese to English. The best result shows we gain 1.89 BLEU percentage points over a state of the art HPBT baseline system without using multiple CWS results.
AB - In Chinese to English statistical machine translation (SMT), Chinese texts always need a pre-processing which segments sentences into words and this standard approach is Chinese word segmentation (CWS). However, CWS is not developed for SMT, its results are not necessarily optimal for SMT. In recent years, many investigations have been performed concerning making CWS suitable for SMT, but we explore it from another direction. In this paper, our basic idea is to use multiple CWS results as additional language knowledge sources and we present a simple and effective approach to use multiple CWS results for SMT. We also give experiment results over range of strategy settings, and obtain substantial improvements in performance for translation from Chinese to English. The best result shows we gain 1.89 BLEU percentage points over a state of the art HPBT baseline system without using multiple CWS results.
KW - Chinese word segmentation
KW - Feature blending
KW - Feature interpolation
KW - SMT
UR - https://www.scopus.com/pages/publications/77950908270
U2 - 10.1109/IALP.2009.36
DO - 10.1109/IALP.2009.36
M3 - 会议稿件
AN - SCOPUS:77950908270
SN - 9780769539041
T3 - 2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009
SP - 135
EP - 140
BT - 2009 International Conference on Asian Language Processing
T2 - 2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009
Y2 - 7 December 2009 through 9 December 2009
ER -