Skip to main navigation Skip to search Skip to main content

A corpus-based approach to English subordinate clause identification

  • J. Zhang*
  • , T. Zhao
  • , S. Li
  • , J. Yao
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

The complex sentence structure of English is a bottleneck to our practical machine translation system. The simplification of English subordinate clauses will greatly relieves the burden of parsing and other grammatical or semantic analysis of a complex sentence, thus improves the output quality of the MT system. But there have not any satisfactory research achievements reported in this field up to now as we know. In this paper, author's work on a corpus-based approach to English subordinate clause identification is reported. The approach integrates rule-based and statistical methods to get the left and right boundaries of the subordinate clauses. The Penn Treebank corpus is used as the training standard. The precision and recall ratios of subordinate clause identification are tested on both closed and open corpora. A result of 92.9 % precision and 91.26 % recall is obtained for the closed test and the open test result is 80.34 % precision and 83.93 % recall. This algorithm has been integrated into our ma chine translation system. The method can also be applied to processing of any other language.

Original languageEnglish
Pages (from-to)10-12
Number of pages3
JournalHigh Technology Letters
Volume7
Issue number1
StatePublished - Mar 2001

Keywords

  • Corpus
  • Knowledge acquisition
  • Subordinate clauses

Fingerprint

Dive into the research topics of 'A corpus-based approach to English subordinate clause identification'. Together they form a unique fingerprint.

Cite this