Skip to main navigation Skip to search Skip to main content

Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition

  • Chen Gong
  • , Zhenghua Li*
  • , Qingrong Xia
  • , Wenliang Chen
  • , Min Zhang
  • *Corresponding author for this work
  • Soochow University

Research output: Contribution to journalArticlepeer-review

Abstract

Chinese named entity recognition (CNER) aims to identify entity names such as person names and organization names from Chinese raw text and thus can quickly extract the entity information that people are concerned about from large-scale texts. Recent studies attempt to improve performance by integrating lexicon words into char-based CNER models. These existing studies, however, usually focus on leveraging the context-free words in lexicon without considering the contextual information of words and subwords in the sentences. To address this issue, in addition to utilizing the lexicon words, we further propose to construct a hierarchical tree structure representation composed of characters, subwords and context-aware predicted words from segmentor to represent each sentence for CNER. Based on the tree-structure representation, we propose a hierarchical long short-term memory (HiLSTM) framework, which consists of hierarchical encoding layer, fusion layer and CRF layer, to capture linguistic knowledge at different levels. On the one hand, the interactions within each level help to obtain the contextual information. On the other hand, the propagations from the lower-levels to the upper-levels can provide additional semantic knowledge for CNER. Experimental results on three widely used CNER datasets show that our proposed HiLSTM model achieves significant improvement over several strong benchmark methods.

Original languageEnglish
Article number202102
JournalScience China Information Sciences
Volume63
Issue number10
DOIs
StatePublished - 1 Oct 2020
Externally publishedYes

Keywords

  • named entity recognition
  • natural language processing
  • neural networks
  • representation learning

Fingerprint

Dive into the research topics of 'Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition'. Together they form a unique fingerprint.

Cite this