Skip to main navigation Skip to search Skip to main content

Efficient histogram-based range query estimation for dirty data

  • Yan Zhang
  • , Hongzhi Wang*
  • , Long Yang
  • , Jianzhong Li
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

In recent years, data quality issues have attracted wide attentions. Data quality problems are mainly caused by dirty data. Currently, many methods for dirty data management have been proposed, and one of them is entity-based relational database in which one tuple represents an entity. The traditional query optimizations are not suitable for the new entity-based model. Then new query optimizations need to be developed. In this paper, we propose a new query selectivity estimation strategy based on histogram, and focus on solving the overestimation which traditional methods lead to. We prove our approaches are unbiased. The experimental results on both real and synthetic data sets show that our approaches can give good estimates with low error.

Original languageEnglish
Pages (from-to)984-999
Number of pages16
JournalFrontiers of Computer Science
Volume12
Issue number5
DOIs
StatePublished - 1 Oct 2018

Keywords

  • data quality
  • dirty data management
  • histogram
  • query estimation

Fingerprint

Dive into the research topics of 'Efficient histogram-based range query estimation for dirty data'. Together they form a unique fingerprint.

Cite this