TY - GEN
T1 - CBAClean:A Comprehensive System for Recommending Data Cleaning Solutions Through Cost-Benefit Analysis in Data Quality Management
AU - Ding, Xiaoou
AU - Su, Hongbin
AU - Qian, Zekai
AU - Cui, Wenxuan
AU - Chen, Siying
AU - Liang, Zheng
AU - Wang, Chen
AU - Wang, Hongzhi
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The scale of data analysis tasks have increased, highlighting the critical importance of data quality. Data quality assessment and repair have become pivotal in data preparation. Despite the availability of numerous algorithms for data cleaning, these often focus on optimizing efficiency and minimizing labor costs, neglecting the explicit relationship between data quality management costs and benefits. This omission can lead to the failure of promising data analysis solutions. To address this, we propose CBAClean, a comprehensive system that integrates cost-benefit analysis into data cleaning. CBAClean aims to assist users in quantifying the costs of data quality management and providing optimal data cleaning solutions tailored to their needs. Key features include task-centered multi-perspective data quality assessment, a comprehensive data quality repair operator library, fine-grained human role division for effective cost control, and recommendation of optimal data cleaning solutions based on cost-benefit calculations. By incorporating cost-benefit analysis, CBAClean enhances the practical application of data quality management on real-world data governance platforms.
AB - The scale of data analysis tasks have increased, highlighting the critical importance of data quality. Data quality assessment and repair have become pivotal in data preparation. Despite the availability of numerous algorithms for data cleaning, these often focus on optimizing efficiency and minimizing labor costs, neglecting the explicit relationship between data quality management costs and benefits. This omission can lead to the failure of promising data analysis solutions. To address this, we propose CBAClean, a comprehensive system that integrates cost-benefit analysis into data cleaning. CBAClean aims to assist users in quantifying the costs of data quality management and providing optimal data cleaning solutions tailored to their needs. Key features include task-centered multi-perspective data quality assessment, a comprehensive data quality repair operator library, fine-grained human role division for effective cost control, and recommendation of optimal data cleaning solutions based on cost-benefit calculations. By incorporating cost-benefit analysis, CBAClean enhances the practical application of data quality management on real-world data governance platforms.
KW - data cleaning
KW - data quality
UR - https://www.scopus.com/pages/publications/105015575698
U2 - 10.1109/ICDE65448.2025.00371
DO - 10.1109/ICDE65448.2025.00371
M3 - 会议稿件
AN - SCOPUS:105015575698
T3 - Proceedings - International Conference on Data Engineering
SP - 4636
EP - 4639
BT - Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PB - IEEE Computer Society
T2 - 41st IEEE International Conference on Data Engineering, ICDE 2025
Y2 - 19 May 2025 through 23 May 2025
ER -