Skip to main navigation Skip to search Skip to main content

Imputation for categorical attributes with probabilistic reasoning

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Since incompleteness affects the data usage, missing values in database should be estimated to make data mining and analysis more accurate. In addition to ignoring or setting to default values, many imputation methods have been proposed, but all of them have their limitations. This paper proposes a probabilistic method to estimate missing values. We construct a Bayesian network in a novel way to identify the dependencies in a dataset, then use the Bayesian reasoning process to find the most probable substitution for each missing value. The benefits of this method include (1) irrelevant attributes can be ignored during estimation; (2) network is built with no target attribute, which means all attributes are handled in one model;(3) probability information can be obtained to measure the accuracy of the imputation. Experimental results show that our construction algorithm is effective and the quality of filled values outperforms the mode imputation method and kNN method. We also verify the effectiveness of the probabilities given by our method experimentally.

Original languageEnglish
Title of host publicationWeb-Age Information Management - 14th International Conference, WAIM 2013, Proceedings
PublisherSpringer Verlag
Pages87-98
Number of pages12
ISBN (Print)9783642385612
DOIs
StatePublished - 2013
Event14th International Conference on Web-Age Information Management, WAIM 2013 - Beidaihe, China
Duration: 14 Jun 201316 Jun 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7923 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Web-Age Information Management, WAIM 2013
Country/TerritoryChina
CityBeidaihe
Period14/06/1316/06/13

Keywords

  • Bayesian Network
  • Missing value imputation
  • Probabilistic reasoning

Fingerprint

Dive into the research topics of 'Imputation for categorical attributes with probabilistic reasoning'. Together they form a unique fingerprint.

Cite this