TY - GEN
T1 - Learning deep neural network based kernel functions for small sample size classification
AU - Zheng, Tieran
AU - Han, Jiqing
AU - Zheng, Guibin
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Kernel learning is to learn a kernel function based on the set of all sample pairs from training data. Even for small sample size classification tasks, the set size is mostly large enough to make a complex kernel that holds lots of parameters being well optimized. Hence, the complex kernel can be helpful in improving classification performance via providing more meaningful feature representation in kernel induced feature space. In this paper, we propose to embed a deep neural network (DNN) into kernel functions, taking its output as kernel parameter to adjust the feature representations adaptively. Two kind of DNN based kernels are defined, and both of them are proved to satisfy the Mercer theorem. Considering the connection between kernel and classifier, we optimize the proposed DNN based kernels by exploiting the GMKL alternating optimization framework. A stochastic gradient descent (SGD) based algorithm is also proposed, which still implements alternating optimization in each iteration. Furthermore, an incremental batch size method is given to reduce gradient noise gradually in optimization process. Experimental results show that our method performed better than the typical methods.
AB - Kernel learning is to learn a kernel function based on the set of all sample pairs from training data. Even for small sample size classification tasks, the set size is mostly large enough to make a complex kernel that holds lots of parameters being well optimized. Hence, the complex kernel can be helpful in improving classification performance via providing more meaningful feature representation in kernel induced feature space. In this paper, we propose to embed a deep neural network (DNN) into kernel functions, taking its output as kernel parameter to adjust the feature representations adaptively. Two kind of DNN based kernels are defined, and both of them are proved to satisfy the Mercer theorem. Considering the connection between kernel and classifier, we optimize the proposed DNN based kernels by exploiting the GMKL alternating optimization framework. A stochastic gradient descent (SGD) based algorithm is also proposed, which still implements alternating optimization in each iteration. Furthermore, an incremental batch size method is given to reduce gradient noise gradually in optimization process. Experimental results show that our method performed better than the typical methods.
KW - Deep neural network
KW - Kernel learning
KW - Small sample size classification
KW - Stochastic optimization algorithm
UR - https://www.scopus.com/pages/publications/85035115804
U2 - 10.1007/978-3-319-70087-8_15
DO - 10.1007/978-3-319-70087-8_15
M3 - 会议稿件
AN - SCOPUS:85035115804
SN - 9783319700861
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 135
EP - 143
BT - Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings
A2 - Li, Yuanqing
A2 - Liu, Derong
A2 - Xie, Shengli
A2 - El-Alfy, El-Sayed M.
A2 - Zhao, Dongbin
PB - Springer Verlag
T2 - 24th International Conference on Neural Information Processing, ICONIP 2017
Y2 - 14 November 2017 through 18 November 2017
ER -