TY - GEN
T1 - SMT helps bitext dependency parsing
AU - Chen, Wenliang
AU - Kazama, Jun'ichi
AU - Zhang, Min
AU - Tsuruoka, Yoshimasa
AU - Zhang, Yujie
AU - Wang, Yiou
AU - Torisawa, Kentaro
AU - Li, Haizhou
PY - 2011
Y1 - 2011
N2 - We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-the-art baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.
AB - We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-the-art baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.
UR - https://www.scopus.com/pages/publications/80053283959
M3 - 会议稿件
AN - SCOPUS:80053283959
SN - 1937284115
SN - 9781937284114
T3 - EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 73
EP - 83
BT - EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
T2 - Conference on Empirical Methods in Natural Language Processing, EMNLP 2011
Y2 - 27 July 2011 through 31 July 2011
ER -