Skip to main navigation Skip to search Skip to main content

WENN for individualized cleaning in imbalanced data

  • Hongjiao Guan
  • , Yingtao Zhang
  • , Min Xian
  • , H. D. Cheng
  • , Xianglong Tang
  • Harbin Institute of Technology
  • Utah State University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes individualized cleaning for diverse imbalanced data sets. Existing techniques for data cleaning have difficulties with rare cases and outliers in minority class, especially, in highly unbalanced data. The drawback leads incomplete and imprecise examples to removal. In order to enhance the robustness and perform thorough data cleaning, we propose a weighted edited nearest neighbor (WENN), which detects and removes noisy examples from both classes intelligently. It considers individual characteristics of each imbalanced data, involving global class imbalance and local distribution. The main idea of the proposed method is to carefully put more focus on the majority class than the minority class during data cleaning. Extensive experiments over synthetic and real data clearly validate the superiority of our approach against other data cleaning methods.

Original languageEnglish
Title of host publication2016 23rd International Conference on Pattern Recognition, ICPR 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages456-461
Number of pages6
ISBN (Electronic)9781509048472
DOIs
StatePublished - 1 Jan 2016
Event23rd International Conference on Pattern Recognition, ICPR 2016 - Cancun, Mexico
Duration: 4 Dec 20168 Dec 2016

Publication series

NameProceedings - International Conference on Pattern Recognition
Volume0
ISSN (Print)1051-4651

Conference

Conference23rd International Conference on Pattern Recognition, ICPR 2016
Country/TerritoryMexico
CityCancun
Period4/12/168/12/16

Keywords

  • Data cleaning
  • Imbalanced data
  • WENN

Fingerprint

Dive into the research topics of 'WENN for individualized cleaning in imbalanced data'. Together they form a unique fingerprint.

Cite this