Skip to main navigation Skip to search Skip to main content

CleanCloud: Cleaning big data on cloud

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We describe CleanCloud, a system for cleaning big data based on Map-Reduce paradigm in cloud. Using Map-Reduce paradigm, the system detects and repairs various data quality problems in big data. We demonstrate the following features of CleanCloud: (a) the support for cleaning multiple data quality problems in big data; (b) a visual tool for watching the status of big data cleaning process and tuning the parameters for data cleaning; (c) the friendly interface for data input and setting as well as cleaned data collection for big data. CleanCloud is a promising system that provides scalable and effect data cleaning mechanism for big data in either files or databases.

Original languageEnglish
Title of host publicationCIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages2543-2546
Number of pages4
ISBN (Electronic)9781450349185
DOIs
StatePublished - 6 Nov 2017
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: 6 Nov 201710 Nov 2017

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
VolumePart F131841

Conference

Conference26th ACM International Conference on Information and Knowledge Management, CIKM 2017
Country/TerritorySingapore
CitySingapore
Period6/11/1710/11/17

Keywords

  • Data cleaning
  • Entity resolution
  • Parallel computing

Fingerprint

Dive into the research topics of 'CleanCloud: Cleaning big data on cloud'. Together they form a unique fingerprint.

Cite this