Skip to main navigation Skip to search Skip to main content

Efficient entity resolution based on sequence rules

  • Yakun Li*
  • , Hongzhi Wang
  • , Hong Gao
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Entity resolution (ER) is to find the data objects referring to the same real-world entity. When ER is performed on relations, the crucial operator is record matching, which is to judge whether two tuples referring to the same real-world entity. Record matching is a longstanding issue. However, with massive and complex data in applications, current methods cannot satisfy the requirements. A Sequence-rule-based record matching (SeReMatching) is presented with the consideration of both the values of the attributes and their importance in record matching. And with the help of the Bloom Filter we changed, the algorithm greatly increases the checking speed and makes the complexity of entity resolution almost O(n). And extensive experiments are performed to evaluate our methods.

Original languageEnglish
Title of host publicationAdvanced Research on Computer Science and Information Engineering - International Conference, CSIE 2011, Proceedings
Pages381-388
Number of pages8
EditionPART 1
DOIs
StatePublished - 2011
EventInternational Conference on Advanced Research on Computer Science and Information Engineering, CSIE 2011 - Zhengzhou, China
Duration: 21 May 201122 May 2011

Publication series

NameCommunications in Computer and Information Science
NumberPART 1
Volume152 CCIS
ISSN (Print)1865-0929

Conference

ConferenceInternational Conference on Advanced Research on Computer Science and Information Engineering, CSIE 2011
Country/TerritoryChina
CityZhengzhou
Period21/05/1122/05/11

Keywords

  • Bloom Filter
  • Entity resolution
  • Record matching

Fingerprint

Dive into the research topics of 'Efficient entity resolution based on sequence rules'. Together they form a unique fingerprint.

Cite this