Skip to main navigation Skip to search Skip to main content

Effective Bayesian-network-based missing value imputation enhanced by crowdsourcing

  • Chen Ye
  • , Hongzhi Wang*
  • , Wenbo Lu
  • , Jianzhong Li
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Hangzhou Dianzi University

Research output: Contribution to journalArticlepeer-review

Abstract

During the process of data collection, incompleteness is one of the most serious data quality problems to deal with. Traditional imputation methods mostly rely on statistics and machine learning techniques. However, both types of methods are limited in their accuracy due to lacking enough information about the missing data. To obtain more information, recent methods resort to external sources such as knowledge bases or the worldwide web. Unfortunately, such methods may still be less helpful, since there may exist little information about the missing values in the knowledge bases, or too much noise on the web. To tackle these issues, this paper adopts crowdsourcing as the external source, where hundreds of thousands of ordinary workers on the platform can provide high-quality information based on contextual knowledge and human cognitive ability. To reduce the cost, a joint model is proposed for imputation, which integrates crowdsourcing into the process of Bayesian inference. We first construct a Bayesian network for the attributes in the dataset, then the missing attribute values are inferred by Bayesian inference. To improve the accuracy of the Bayesian inference, we outsource a small number of informative tasks to the crowd workers, where the informative tasks are selected based on uncertainty and influence. The proposed approach is evaluated with extensive experiments using real-world datasets with a simulated crowd and two real crowdsourcing platforms. The experimental results show that our approach achieves a better performance compared to other imputation approaches.

Original languageEnglish
Article number105199
JournalKnowledge-Based Systems
Volume190
DOIs
StatePublished - 29 Feb 2020

Keywords

  • Bayesian network
  • Crowdsourcing
  • Missing values

Fingerprint

Dive into the research topics of 'Effective Bayesian-network-based missing value imputation enhanced by crowdsourcing'. Together they form a unique fingerprint.

Cite this