Abstract
It is increasingly common to find data with a complex structure in the real world. To effectively use complex data in practice, necessary techniques must be in place to improve the quality of the data. Entity resolution is a central issue in data quality management for complex objects. It is to find the data objects that refer to the same real-world entity, and to cluster such objects together. It has been proven extremely useful in data fusion, inconsistency detection and in data repairing. Nevertheless, the complex structures of data introduce new challenges and make object identification much harder than record matching on relational data. In response to the new challenges, there has been a lost of work on this topic. This paper aims to provide an overview of recent advances in the study of object identification, on complex objects including XML, graph data and complex networks. For XML data, we survey techniques of pairwise entity and group-wise entity resolution. For graph data, we focus on how to determine whether two graphs refer to the same real-world entity. We also present the metrics and methods for identifying vertexes that pertain to the same real-world entity in a complex network. Finally we discuss directions for future research.
| Original language | English |
|---|---|
| Pages (from-to) | 1843-1852 |
| Number of pages | 10 |
| Journal | Jisuanji Xuebao/Chinese Journal of Computers |
| Volume | 34 |
| Issue number | 10 |
| DOIs | |
| State | Published - Oct 2011 |
| Externally published | Yes |
Keywords
- Complex data
- Complex network
- Data quality
- Object identification
- XML graph
Fingerprint
Dive into the research topics of 'Object identification on complex data: A survey'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver