Skip to main navigation Skip to search Skip to main content

An improved unknown word recognition model based on multi-knowledge source method

  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Unknown word recognition (UWR) is a difficult and foundational task in lexical processing and content-based understanding. And it can improve many text-based processing applications, such as Information Extraction, Question Answer system, Electronic Meeting System. However the unified dealing approach is difficult to exploit more domain knowledge features, so the performance cannot be further improved easily, since UWR has been proved to be NP-hard problem. This paper presents a novel method for UWR task, which divides the UWR into several hard sub-tasks that usually encountering different difficulties, accordingly, several language models are adopted to solve the special sub-tasks, so as to exert the ability of each model in addressing special problems. Firstly, a class-based trigram is used in basic word segmentation, aided with absolute smoothing algorithm to overcome data sparseness. And Maximum Entropy Model (ME) is used to recognize Named Entity. New word detection adopts variance and Conditional Random Fields algorithm. Secondly, Multi-Knowledge features are effectively extracted and utilized in whole processing. Our system participated in the Second International Chinese Word Segmentation Bakeoff (SIGHAN2005), and got the overall performance 97.2% F-measure in MSRA open test.

Original languageEnglish
Title of host publicationProceedings - ISDA 2006
Subtitle of host publicationSixth International Conference on Intelligent Systems Design and Applications
Pages825-832
Number of pages8
DOIs
StatePublished - 2006
Externally publishedYes
EventISDA 2006: Sixth International Conference on Intelligent Systems Design and Applications - Jinan, China
Duration: 16 Oct 200618 Oct 2006

Publication series

NameProceedings - ISDA 2006: Sixth International Conference on Intelligent Systems Design and Applications
Volume2

Conference

ConferenceISDA 2006: Sixth International Conference on Intelligent Systems Design and Applications
Country/TerritoryChina
CityJinan
Period16/10/0618/10/06

Keywords

  • Conditional random fields
  • Maximum entropy model
  • Out-of-vocabulary word recognition
  • Question answer system
  • Unknown word recognition

Fingerprint

Dive into the research topics of 'An improved unknown word recognition model based on multi-knowledge source method'. Together they form a unique fingerprint.

Cite this