Abstract
Multi-label feature selection plays a critical role in data management and analysis by reducing feature dimensionality while preserving discriminative capability. However, real-world multi-label datasets commonly exhibit label coverage imbalance, causing feature evaluation to be dominated by labels with high coverage. Moreover, feature redundancy is typically estimated using averaged dependency measures, which underestimate dominant redundant relationships under heterogeneous information scales. To address these challenges, we propose a multi-label feature selection method, termed Complementary and Redundancy-Aware Feature Selection for Imbalanced Coverage (CIRFS). CIRFS introduces a coverage-aware label weighting strategy that explicitly models label coverage and normalized label frequency to dynamically mitigate well-covered label dominance. In addition, it adopts a maximum redundancy ratio criterion to characterize feature redundancy from a worst-case information perspective, enabling accurate identification of dominant redundant relationships. Furthermore, mutual information (MI) and stabilized conditional mutual information (CMI) are jointly integrated to capture complementary aspects of feature-label information that cannot be fully characterized by either measure alone. Experiments on 14 real-world multi-label datasets demonstrate that CIRFS outperforms nine representative feature selection methods across four evaluation metrics.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Knowledge and Data Engineering |
| DOIs | |
| State | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- Multi-label learning
- feature selection
- information theory
- label imbalance
Fingerprint
Dive into the research topics of 'Multi-Label Feature Selection under Coverage Imbalance and Feature Redundancy'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver