Skip to main navigation Skip to search Skip to main content

Research on classification method of high-dimensional class-imbalanced datasets based on SVM

  • Harbin Institute of Technology Shenzhen

Research output: Contribution to journalArticlepeer-review

Abstract

High-dimensional problems result in bad classification results because some combinations of features have an adverse effect on classification; while class-imbalanced problems make the classifier to concern the majority class more but the minority less, because the number of samples of majority class is more than minority class. The problem of both high-dimensional and class-imbalanced classification is found in many fields such as bioinformatics, healthcare and so on. Many researchers study either the high-dimensional problem or class-imbalanced problem and come up with a series of algorithms, but they ignore the above new problem, which indicates high-dimensional problems affect sampling process while class-imbalanced problems interfere feature selection. Firstly, this paper analyses the new problem arising from the mutual influence of the two problems, and then introduces SVM and analyses its advantages in dealing high-dimensional problem and class-imbalanced problem. Next, this paper proposes a new algorithm named BRFE-PBKS-SVM aimed at high-dimensional class-imbalanced datasets, which improves SVM-RFE by considering the class-imbalanced problem in the process of feature selection, and it also improves SMOTE so that the procedure of over-sampling could work in the Hilbert space with an adaptive over-sampling rate by PSO. Finally, the experimental results show the performance of this algorithm.

Original languageEnglish
Pages (from-to)1765-1778
Number of pages14
JournalInternational Journal of Machine Learning and Cybernetics
Volume10
Issue number7
DOIs
StatePublished - 1 Jul 2019
Externally publishedYes

Keywords

  • Boundary samples
  • Class-imbalanced
  • Feature selection
  • High-dimensional
  • Over-sampling

Fingerprint

Dive into the research topics of 'Research on classification method of high-dimensional class-imbalanced datasets based on SVM'. Together they form a unique fingerprint.

Cite this