TY - GEN
T1 - Automatic acquisition of large-scale academic bilingual parallel corpus from the web
AU - Han, Yong
AU - Li, Yu
AU - He, Xiaoning
AU - Yang, Muyun
AU - Lei, Guohua
PY - 2009
Y1 - 2009
N2 - In this paper, we describe a system which automatically acquires large-scale Chinese-English bilingual parallel corpus from China Journals Full-text Database (CJFD), a component of China National Knowledge Infrastructure (CNKI). The system gets large amount of parallel texts with domain information from the existing structured bilingual texts in CJFD, such as Chinese and English abstracts and titles of academic articles. The acquired Chinese-English parallel corpus is by several orders of magnitudes larger than similar corpus we have known before. In addition, this system collects a large amount of bilingual terms which can directly apply to lexical acquisition.
AB - In this paper, we describe a system which automatically acquires large-scale Chinese-English bilingual parallel corpus from China Journals Full-text Database (CJFD), a component of China National Knowledge Infrastructure (CNKI). The system gets large amount of parallel texts with domain information from the existing structured bilingual texts in CJFD, such as Chinese and English abstracts and titles of academic articles. The acquired Chinese-English parallel corpus is by several orders of magnitudes larger than similar corpus we have known before. In addition, this system collects a large amount of bilingual terms which can directly apply to lexical acquisition.
KW - Bilingual parallel corpora acquision
KW - Bilingual term acquision
KW - Data mining
UR - https://www.scopus.com/pages/publications/77950895466
U2 - 10.1109/IALP.2009.75
DO - 10.1109/IALP.2009.75
M3 - 会议稿件
AN - SCOPUS:77950895466
SN - 9780769539041
T3 - 2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009
SP - 318
EP - 321
BT - 2009 International Conference on Asian Language Processing
T2 - 2009 International Conference on Asian Language Processing: Recent Advances in Asian Language Processing, IALP 2009
Y2 - 7 December 2009 through 9 December 2009
ER -