TY - GEN
T1 - A refinement approach to handling model misfit in semi-supervised learning
AU - Su, Hanjing
AU - Chen, Ling
AU - Ye, Yunming
AU - Sun, Zhaocai
AU - Wu, Qingyao
PY - 2010
Y1 - 2010
N2 - Semi-supervised learning has been the focus of machine learning and data mining research in the past few years. Various algorithms and techniques have been proposed, from generative models to graph-based algorithms. In this work, we focus on the Cluster-and-Label approaches for semi-supervised classification. Existing cluster-and-label algorithms are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy will be high. Otherwise, the accuracy will be low. In this paper, we propose a refinement approach to address the model misfit problem in semi-supervised classification. We show that we do not need to change the cluster-and-label technique itself to make it more flexible. Instead, we propose to use successive refinement clustering of the dataset to correct the model misfit. A series of experiments on UCI benchmarking data sets have shown that the proposed approach outperforms existing cluster-and-label algorithms, as well as traditional semi-supervised classification techniques including Selftraining and Tri-training.
AB - Semi-supervised learning has been the focus of machine learning and data mining research in the past few years. Various algorithms and techniques have been proposed, from generative models to graph-based algorithms. In this work, we focus on the Cluster-and-Label approaches for semi-supervised classification. Existing cluster-and-label algorithms are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy will be high. Otherwise, the accuracy will be low. In this paper, we propose a refinement approach to address the model misfit problem in semi-supervised classification. We show that we do not need to change the cluster-and-label technique itself to make it more flexible. Instead, we propose to use successive refinement clustering of the dataset to correct the model misfit. A series of experiments on UCI benchmarking data sets have shown that the proposed approach outperforms existing cluster-and-label algorithms, as well as traditional semi-supervised classification techniques including Selftraining and Tri-training.
KW - Semi-supervised learning
KW - classification
KW - model misfit
UR - https://www.scopus.com/pages/publications/78650183218
U2 - 10.1007/978-3-642-17313-4_8
DO - 10.1007/978-3-642-17313-4_8
M3 - 会议稿件
AN - SCOPUS:78650183218
SN - 3642173128
SN - 9783642173127
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 75
EP - 86
BT - Advanced Data Mining and Applications - 6th International Conference, ADMA 2010, Proceedings
T2 - 6th International Conference on Advanced Data Mining and Applications, ADMA 2010
Y2 - 19 November 2010 through 21 November 2010
ER -