Skip to main navigation Skip to search Skip to main content

The construction of a Chinese shallow treebank

  • Hong Kong Polytechnic University

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper presents the construction of a manually annotated Chinese shallow Treebank, named PolyU Treebank. Different from traditional Chinese Treebank based on full parsing, the PolyU Treebank is based on shallow parsing in which only partial syntactical structures are annotated. This Treebank can be used to support shallow parser training, testing and other natural language applications. Phrase-based Grammar, proposed by Peking University, is used to guide the design and implementation of the PolyU Treebank. The design principles include good resource sharing, low structural complexity, sufficient syntactic information and large data scale. The design issues, including corpus material preparation, standard for word segmentation and POS tagging, and the guideline for phrase bracketing and annotation, are presented in this paper. Well-designed workflow and effective semiautomatic and automatic annotation checking are used to ensure annotation accuracy and consistency. Currently, the PolyU Treebank has completed the annotation of a 1-million-word corpus. The evaluation shows that the accuracy of annotation is higher than 98%.

Original languageEnglish
Pages94-101
Number of pages8
StatePublished - 2004
Externally publishedYes
Event3rd SIGHAN Workshop on Chinese Language Processing, SIGHAN@ACL 2004 - Barcelona, Spain
Duration: 25 Jul 2004 → …

Conference

Conference3rd SIGHAN Workshop on Chinese Language Processing, SIGHAN@ACL 2004
Country/TerritorySpain
CityBarcelona
Period25/07/04 → …

Fingerprint

Dive into the research topics of 'The construction of a Chinese shallow treebank'. Together they form a unique fingerprint.

Cite this