Skip to main navigation Skip to search Skip to main content

Discovering relations between named entities from a large raw corpus using tree similarity-based clustering

  • Min Zhang*
  • , Jian Su
  • , Danmei Wang
  • , Guodong Zhou
  • , Chew Lim Tan
  • *Corresponding author for this work
  • Agency for Science, Technology and Research, Singapore
  • National University of Singapore

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We propose a tree-similarity-based unsupervised learning method to extract relations between Named Entities from a large raw corpus. Our method regards relation extraction as a clustering problem on shallow parse trees. First, we modify previous tree kernels on relation extraction to estimate the similarity between parse trees more efficiently. Then, the similarity between parse trees is used in a hierarchical clustering algorithm to group entity pairs into different clusters. Finally, each cluster is labeled by an indicative word and unreliable clusters are pruned out. Evaluation on the New York Times (1995) corpus shows that our method outperforms the only previous work by 5 in F-measure. It also shows that our method performs well on both high-frequent and less-frequent entity pairs. To the best of our knowledge, this is the first work to use a tree similarity metric in relation clustering.

Original languageEnglish
Title of host publicationNatural Language Processing - IJCNLP 2005 - Second International Joint Conference, Proceedings
PublisherSpringer Verlag
Pages378-389
Number of pages12
ISBN (Print)3540291725, 9783540291725
DOIs
StatePublished - 2005
Externally publishedYes
Event2nd International Joint Conference on Natural Language Processing, IJCNLP 2005 - Jeju Island, Korea, Republic of
Duration: 11 Oct 200513 Oct 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3651 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Joint Conference on Natural Language Processing, IJCNLP 2005
Country/TerritoryKorea, Republic of
CityJeju Island
Period11/10/0513/10/05

Fingerprint

Dive into the research topics of 'Discovering relations between named entities from a large raw corpus using tree similarity-based clustering'. Together they form a unique fingerprint.

Cite this