Skip to main navigation Skip to search Skip to main content

M-CNER: A corpus for Chinese named entity recognition in multi-domains

  • Qi Lu
  • , Yao Sheng Yang
  • , Zhenghua Li
  • , Wenliang Chen
  • , Min Zhang
  • Soochow University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we present a new corpus for Chinese Named Entity Recognition (NER) from three domains: human-computer interaction, social media, and e-commerce. The annotation procedure is conducted in two rounds. In the first round, one sentence is annotated by more than one persons independently. In the second round, the experts discuss the sentences for which the annotators do not make agreements. Finally, we obtain a corpus which have five data sets in three domains. We further evaluate three popular models on the newly created data sets. The experimental results show that the system based on Bi-LSTM-CRF performs the best among the comparison systems on all the data sets. The corpus can be used for further studies in research community.

Original languageEnglish
Title of host publicationLREC 2018 - 11th International Conference on Language Resources and Evaluation
EditorsNicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
PublisherEuropean Language Resources Association (ELRA)
Pages4457-4461
Number of pages5
ISBN (Electronic)9791095546009
StatePublished - 2018
Externally publishedYes
Event11th International Conference on Language Resources and Evaluation, LREC 2018 - Miyazaki, Japan
Duration: 7 May 201812 May 2018

Publication series

NameLREC 2018 - 11th International Conference on Language Resources and Evaluation

Conference

Conference11th International Conference on Language Resources and Evaluation, LREC 2018
Country/TerritoryJapan
CityMiyazaki
Period7/05/1812/05/18

Keywords

  • Chinese Data Set
  • Information Extraction
  • Named Entity Recognition

Fingerprint

Dive into the research topics of 'M-CNER: A corpus for Chinese named entity recognition in multi-domains'. Together they form a unique fingerprint.

Cite this