TY - GEN
T1 - Tree-based metric learning for distance computation in data mining
AU - Yan, Ming
AU - Zhang, Yan
AU - Wang, Hongzhi
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Distance is an essential measurement of data mining. A good metric often leads to a good performance. Then how to obtain a proper metric systematically is critical. Distance metric learning is a classic method to learn distances between instances on data set with complex distributions. However, most researches on distance metric learning are based on Mahalanobis metric, which is equivalent to linear transformation on distance space that has limitation on complex data. To solve this problem, we propose a metric learning method based on non-linear transformation suitable for complex data. By using the tree model, we could address non-linearly separable data that rearrange input data and represent them to another forms, and tree model could be able to implicitly represent data to a new distance space with a non-linear activator function. Furthermore, single tree model will lead to overfit that has higher generalization errors. Therefore, we design a randomize algorithm to combining different tree models which could reduce the generalization errors in theory and practice. According to analysis, we prove the correctness and effectiveness of our algorithm in theory. Extensive experiments demonstrate that algorithm is stable and suitable for data mining.
AB - Distance is an essential measurement of data mining. A good metric often leads to a good performance. Then how to obtain a proper metric systematically is critical. Distance metric learning is a classic method to learn distances between instances on data set with complex distributions. However, most researches on distance metric learning are based on Mahalanobis metric, which is equivalent to linear transformation on distance space that has limitation on complex data. To solve this problem, we propose a metric learning method based on non-linear transformation suitable for complex data. By using the tree model, we could address non-linearly separable data that rearrange input data and represent them to another forms, and tree model could be able to implicitly represent data to a new distance space with a non-linear activator function. Furthermore, single tree model will lead to overfit that has higher generalization errors. Therefore, we design a randomize algorithm to combining different tree models which could reduce the generalization errors in theory and practice. According to analysis, we prove the correctness and effectiveness of our algorithm in theory. Extensive experiments demonstrate that algorithm is stable and suitable for data mining.
UR - https://www.scopus.com/pages/publications/84950236256
U2 - 10.1007/978-3-319-25255-1_31
DO - 10.1007/978-3-319-25255-1_31
M3 - 会议稿件
AN - SCOPUS:84950236256
SN - 9783319252544
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 377
EP - 388
BT - Web Technologies and Applications - 17th Asia-PacificWeb Conference,APWeb 2015, Proceedings
A2 - Cheng, Reynold
A2 - Cui, Bin
A2 - Zhang, Zhenjie
A2 - Cai, Ruichu
A2 - Xu, Jia
PB - Springer Verlag
T2 - 17th Asia-PacificWeb Conference, APWeb 2015
Y2 - 18 September 2015 through 20 September 2015
ER -