Abstract
Just-in-time software defect prediction technology is a defect prediction method that enables defect prediction of software change levels. The difficulties of learning classifiers from imbalanced data is demonstrated in a variety of real-world applications,especially in this era of big data, which has generated more classification tasks. Researchers have taken many existing JIT-SDP efforts to assume that the features of software releases remain costant over time. However, the researchers did not consider that JIT-SDP may be affected by the gradual evolution of class imbalance. Specifically, class imbalance (that is, the number of changes caused by defects is not adequately represented) has been changing over time, and the number of clean class changes and defect class changes may both increase or decrease, so here In this case, the existing JIT-SDP method becomes inapplicable. Taking these factors into consideration, we propose a new imbalanced classification framework, which aims to achieve data class balance by applying a new three-step smart pruning strategy, i.e., first undersampling the majority class, then undersampling the minority class. Oversampling is performed, since the minority class becomes the majority class after oversampling, as a result, the final stage is to intelligently undersample the minority group that eventually becomes the dominant group. Through these three steps, data balance is achieved before classification. Experiments show that this new framework is very computationally efficient, leading to better performance even under highly imbalanced distributions of clean and defective data. At the same time, our proposed framework can also be easily adapted to most existing learning methods to improve their performance on imbalanced data.
| Original language | English |
|---|---|
| Pages (from-to) | 128-134 |
| Number of pages | 7 |
| Journal | CEUR Workshop Proceedings |
| Volume | 3206 |
| State | Published - 2022 |
| Externally published | Yes |
| Event | 7th International Conference on Computer and Information Processing Technology, ISCIPT 2022 - Virtual, Shenyang, China Duration: 5 Aug 2022 → 7 Aug 2022 |
Keywords
- Artifical Intelligence
- Class Imbalance
- JIT-SDP
- Machine Learning
Fingerprint
Dive into the research topics of 'Three-Step Intelligent Pruning for Data Classification in Just-in-Time Software Defect Prediction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver