Skip to main navigation Skip to search Skip to main content

A distributed load balance algorithm of mapreduce for data quality detection

  • Yitong Gao*
  • , Yan Zhang
  • , Hongzhi Wang
  • , Jianzhong Li
  • , Hong Gao
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Big data quality detection is a valuable problem in data quality field. MapReduce is an important distributed data processing model mainly for big data processing. Load balance is a key factor that influences the property of MapReduce. In this paper, we propose a distributed greedy approximation algorithm for load balance problem in MapReduce for data quality detection. There are three key challenges: (a) reduce the problem to NP-complete and prove a considerable approximation ratio of the proposed algorithm, (b) just impose one more round of MapReduce than conventional processing and occupy minimal time in the total process, (c) be simple and convenient feasible. Experimental results on real-life and synthetic data demonstrate that the proposed algorithm in this paper is effective for load balance.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - DASFAA 2016 International Workshops
Subtitle of host publicationBDMS, BDQM, MoI, and SeCoP, Proceedings
EditorsJinho Kim, Hong Gao, Yasushi Sakurai
PublisherSpringer Verlag
Pages294-306
Number of pages13
ISBN (Print)9783319320540
DOIs
StatePublished - 2016
Externally publishedYes
EventInternational Workshop on Database Systems for Advanced Applications, DASFAA 2016 - Dallas, United States
Duration: 16 Apr 201619 Apr 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9645
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Workshop on Database Systems for Advanced Applications, DASFAA 2016
Country/TerritoryUnited States
CityDallas
Period16/04/1619/04/16

Keywords

  • Data quality detection
  • Distributed approximation greedy algorithm
  • Load balance
  • Mapreduce

Fingerprint

Dive into the research topics of 'A distributed load balance algorithm of mapreduce for data quality detection'. Together they form a unique fingerprint.

Cite this