Skip to main navigation Skip to search Skip to main content

Range query estimation for dirty data management system

  • Yan Zhang
  • , Long Yang
  • , Hongzhi Wang*
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, data quality issues have attracted wide attention. Data quality is mainly caused by dirty data. Currently, many methods for dirty data management have been proposed, and one of them is entity-based relational database in which one tuple represents an entity. The traditional query optimizations having the ability to estimate the cost of execution of a query plan have not been suitable for the new entity-based model. Then new query optimizations need to be developed. In this paper, we propose new query selectivity estimation based on histogram, and focus on solving the overestimation which traditional methods lead to. We prove our approaches are unbiased. The experimental results on both real and synthetic data sets show that our approaches can give good estimates with low error.

Original languageEnglish
Title of host publicationWeb-Age Information Management - 13th International Conference, WAIM 2012, Proceedings
Pages152-164
Number of pages13
DOIs
StatePublished - 2012
Event13th International Conference on Web-Age Information Management, WAIM 2012 - Harbin, China
Duration: 18 Aug 201220 Aug 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7418 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Conference on Web-Age Information Management, WAIM 2012
Country/TerritoryChina
CityHarbin
Period18/08/1220/08/12

Keywords

  • data quality
  • dirty data
  • histogram
  • query estimation

Fingerprint

Dive into the research topics of 'Range query estimation for dirty data management system'. Together they form a unique fingerprint.

Cite this