TY - GEN
T1 - A Novel Data Stream Learning Approach to Tackle One-Sided Label Noise From Verification Latency
AU - Song, Liyan
AU - Li, Shuxian
AU - Minku, Leandro L.
AU - Yao, Xin
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Many real-world data stream applications suffer from verification latency, where the labels of the training examples arrive with a delay. In binary classification problems, the labeling process frequently involves waiting for a pre-determined period of time to observe an event that assigns the example to a given class. Once this time passes, if such labeling event does not occur, the example is labeled as belonging to the other class. For example, in software defect prediction, one may wait to see if a defect is associated to a software change implemented by a developer, producing a defect-inducing training example. If no defect is found during the waiting time, the training example is labeled as clean. Such verification latency inherently causes label noise associated to insufficient waiting time. For example, a defect may be observed only after the pre-defined waiting time has passed, resulting in a noisy example of the clean class. Due to the nature of the waiting time, such noise is frequently one-sided, meaning that it only occurs to examples of one of the classes. However, no existing work tackles label noise associated to verification latency. This paper proposes a novel data stream learning approach that estimates the confidence in the labels assigned to the training examples and uses this to improve predictive performance in problems with one-sided label noise. Our experiments with 14 real-world datasets from the domain of software defect prediction demonstrate the effectiveness of the proposed approach compared to existing ones.
AB - Many real-world data stream applications suffer from verification latency, where the labels of the training examples arrive with a delay. In binary classification problems, the labeling process frequently involves waiting for a pre-determined period of time to observe an event that assigns the example to a given class. Once this time passes, if such labeling event does not occur, the example is labeled as belonging to the other class. For example, in software defect prediction, one may wait to see if a defect is associated to a software change implemented by a developer, producing a defect-inducing training example. If no defect is found during the waiting time, the training example is labeled as clean. Such verification latency inherently causes label noise associated to insufficient waiting time. For example, a defect may be observed only after the pre-defined waiting time has passed, resulting in a noisy example of the clean class. Due to the nature of the waiting time, such noise is frequently one-sided, meaning that it only occurs to examples of one of the classes. However, no existing work tackles label noise associated to verification latency. This paper proposes a novel data stream learning approach that estimates the confidence in the labels assigned to the training examples and uses this to improve predictive performance in problems with one-sided label noise. Our experiments with 14 real-world datasets from the domain of software defect prediction demonstrate the effectiveness of the proposed approach compared to existing ones.
KW - Data stream learning
KW - clustering
KW - concept drift
KW - confidence level
KW - just-in-time software defect prediction
KW - one-sided label noise
KW - verification latency
UR - https://www.scopus.com/pages/publications/85140801389
U2 - 10.1109/IJCNN55064.2022.9891911
DO - 10.1109/IJCNN55064.2022.9891911
M3 - 会议稿件
AN - SCOPUS:85140801389
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2022 International Joint Conference on Neural Networks, IJCNN 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Joint Conference on Neural Networks, IJCNN 2022
Y2 - 18 July 2022 through 23 July 2022
ER -