Skip to main navigation Skip to search Skip to main content

A data cleaning framework based on user feedback

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we present our design of a data cleaning framework that combines interaction of data quality rules (CFDS, CINDS and MDs) with user feedback through an interactive process. First, to generate candidate repairs for each potentially dirty attribute, we propose an optimization model based on genetic algorithm. We then create a Bayesian machine learning model with several committees to predict the correctness of the repair and rank these repairs by uncertainly score to improve the learned model. User feedback is used to decide whether the model is accurate while inspecting the suggestions. Finally, our experiments on real-world datasets show significant improvement in data quality.

Original languageEnglish
Title of host publicationWeb-Age Information Management - 14th International Conference, WAIM 2013, Proceedings
PublisherSpringer Verlag
Pages514-520
Number of pages7
ISBN (Print)9783642385612
DOIs
StatePublished - 2013
Event14th International Conference on Web-Age Information Management, WAIM 2013 - Beidaihe, China
Duration: 14 Jun 201316 Jun 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7923 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Web-Age Information Management, WAIM 2013
Country/TerritoryChina
CityBeidaihe
Period14/06/1316/06/13

Keywords

  • Bayesian decision
  • Data clean
  • Data quality rules
  • User feedback

Fingerprint

Dive into the research topics of 'A data cleaning framework based on user feedback'. Together they form a unique fingerprint.

Cite this