Skip to main navigation Skip to search Skip to main content

CBAClean:A Comprehensive System for Recommending Data Cleaning Solutions Through Cost-Benefit Analysis in Data Quality Management

  • Xiaoou Ding
  • , Hongbin Su
  • , Zekai Qian
  • , Wenxuan Cui
  • , Siying Chen
  • , Zheng Liang
  • , Chen Wang
  • , Hongzhi Wang*
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Tsinghua University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The scale of data analysis tasks have increased, highlighting the critical importance of data quality. Data quality assessment and repair have become pivotal in data preparation. Despite the availability of numerous algorithms for data cleaning, these often focus on optimizing efficiency and minimizing labor costs, neglecting the explicit relationship between data quality management costs and benefits. This omission can lead to the failure of promising data analysis solutions. To address this, we propose CBAClean, a comprehensive system that integrates cost-benefit analysis into data cleaning. CBAClean aims to assist users in quantifying the costs of data quality management and providing optimal data cleaning solutions tailored to their needs. Key features include task-centered multi-perspective data quality assessment, a comprehensive data quality repair operator library, fine-grained human role division for effective cost control, and recommendation of optimal data cleaning solutions based on cost-benefit calculations. By incorporating cost-benefit analysis, CBAClean enhances the practical application of data quality management on real-world data governance platforms.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PublisherIEEE Computer Society
Pages4636-4639
Number of pages4
ISBN (Electronic)9798331536039
DOIs
StatePublished - 2025
Externally publishedYes
Event41st IEEE International Conference on Data Engineering, ICDE 2025 - Hong Kong, China
Duration: 19 May 202523 May 2025

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference41st IEEE International Conference on Data Engineering, ICDE 2025
Country/TerritoryChina
CityHong Kong
Period19/05/2523/05/25

Keywords

  • data cleaning
  • data quality

Fingerprint

Dive into the research topics of 'CBAClean:A Comprehensive System for Recommending Data Cleaning Solutions Through Cost-Benefit Analysis in Data Quality Management'. Together they form a unique fingerprint.

Cite this