Skip to main navigation Skip to search Skip to main content

Three-Step Intelligent Pruning for Data Classification in Just-in-Time Software Defect Prediction

  • Xiamen University of Technology

Research output: Contribution to journalConference articlepeer-review

Abstract

Just-in-time software defect prediction technology is a defect prediction method that enables defect prediction of software change levels. The difficulties of learning classifiers from imbalanced data is demonstrated in a variety of real-world applications,especially in this era of big data, which has generated more classification tasks. Researchers have taken many existing JIT-SDP efforts to assume that the features of software releases remain costant over time. However, the researchers did not consider that JIT-SDP may be affected by the gradual evolution of class imbalance. Specifically, class imbalance (that is, the number of changes caused by defects is not adequately represented) has been changing over time, and the number of clean class changes and defect class changes may both increase or decrease, so here In this case, the existing JIT-SDP method becomes inapplicable. Taking these factors into consideration, we propose a new imbalanced classification framework, which aims to achieve data class balance by applying a new three-step smart pruning strategy, i.e., first undersampling the majority class, then undersampling the minority class. Oversampling is performed, since the minority class becomes the majority class after oversampling, as a result, the final stage is to intelligently undersample the minority group that eventually becomes the dominant group. Through these three steps, data balance is achieved before classification. Experiments show that this new framework is very computationally efficient, leading to better performance even under highly imbalanced distributions of clean and defective data. At the same time, our proposed framework can also be easily adapted to most existing learning methods to improve their performance on imbalanced data.

Original languageEnglish
Pages (from-to)128-134
Number of pages7
JournalCEUR Workshop Proceedings
Volume3206
StatePublished - 2022
Externally publishedYes
Event7th International Conference on Computer and Information Processing Technology, ISCIPT 2022 - Virtual, Shenyang, China
Duration: 5 Aug 20227 Aug 2022

Keywords

  • Artifical Intelligence
  • Class Imbalance
  • JIT-SDP
  • Machine Learning

Fingerprint

Dive into the research topics of 'Three-Step Intelligent Pruning for Data Classification in Just-in-Time Software Defect Prediction'. Together they form a unique fingerprint.

Cite this