TY - GEN
T1 - Classification-based Chinese collocation extraction
AU - Xu, Ruifeng
AU - Lu, Qin
AU - Wong, Kam Fai
AU - Li, Wenjie
PY - 2007
Y1 - 2007
N2 - Most collocation extraction algorithms use a single set of criteria and a single threshold which is not quite appropriate because different types of collocations have different behaviors. This paper presents a window-based Chinese collocation extraction system, which identifies different types of collocations separately. By taking into consideration of compositional, non-substitutable, and non-modifiable properties as well as statistical significance, Chinese collocations are classified into four types. A multi-stage extraction system is then designed to separately identify different types of collocations by using different combinations of features. Furthermore, heuristic rules based on dependency knowledge are applied to filter out some pseudo collocations. Experiments show that the proposed system achieves better F 1 performance compared to most existing algorithms for Chinese collocation extraction.
AB - Most collocation extraction algorithms use a single set of criteria and a single threshold which is not quite appropriate because different types of collocations have different behaviors. This paper presents a window-based Chinese collocation extraction system, which identifies different types of collocations separately. By taking into consideration of compositional, non-substitutable, and non-modifiable properties as well as statistical significance, Chinese collocations are classified into four types. A multi-stage extraction system is then designed to separately identify different types of collocations by using different combinations of features. Furthermore, heuristic rules based on dependency knowledge are applied to filter out some pseudo collocations. Experiments show that the proposed system achieves better F 1 performance compared to most existing algorithms for Chinese collocation extraction.
UR - https://www.scopus.com/pages/publications/47749119147
U2 - 10.1109/NLPKE.2007.4368048
DO - 10.1109/NLPKE.2007.4368048
M3 - 会议稿件
AN - SCOPUS:47749119147
SN - 9781424416103
T3 - IEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering
SP - 308
EP - 315
BT - IEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering
T2 - International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007
Y2 - 30 August 2007 through 1 September 2007
ER -