TY - GEN
T1 - Lipschitz Continuity Guided Knowledge Distillation
AU - Shang, Yuzhang
AU - Duan, Bin
AU - Zong, Ziliang
AU - Nie, Liqiang
AU - Yan, Yan
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones. Although great success has been achieved by prior distillation methods via delicately designing various types of knowledge, they overlook the functional properties of neural networks, which makes the process of applying those techniques to new tasks unreliable and non-trivial. To alleviate such problem, in this paper, we initially leverage Lipschitz continuity to better represent the functional characteristic of neural networks and guide the knowledge distillation process. In particular, we propose a novel Lipschitz Continuity Guided Knowledge Distillation framework to faithfully distill knowledge by minimizing the distance between two neural networks' Lipschitz constants, which enables teacher networks to better regularize student networks and improve the corresponding performance. We derive an explainable approximation algorithm with an explicit theoretical derivation to address the NP-hard problem of calculating the Lipschitz constant. Experimental results have shown that our method outperforms other benchmarks over several knowledge distillation tasks (e.g., classification, segmentation and object detection) on CIFAR-100, ImageNet, and PASCAL VOC datasets. Our code is available at https://github.com/42Shawn/LONDON/tree/master.
AB - Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones. Although great success has been achieved by prior distillation methods via delicately designing various types of knowledge, they overlook the functional properties of neural networks, which makes the process of applying those techniques to new tasks unreliable and non-trivial. To alleviate such problem, in this paper, we initially leverage Lipschitz continuity to better represent the functional characteristic of neural networks and guide the knowledge distillation process. In particular, we propose a novel Lipschitz Continuity Guided Knowledge Distillation framework to faithfully distill knowledge by minimizing the distance between two neural networks' Lipschitz constants, which enables teacher networks to better regularize student networks and improve the corresponding performance. We derive an explainable approximation algorithm with an explicit theoretical derivation to address the NP-hard problem of calculating the Lipschitz constant. Experimental results have shown that our method outperforms other benchmarks over several knowledge distillation tasks (e.g., classification, segmentation and object detection) on CIFAR-100, ImageNet, and PASCAL VOC datasets. Our code is available at https://github.com/42Shawn/LONDON/tree/master.
UR - https://www.scopus.com/pages/publications/85121774789
U2 - 10.1109/ICCV48922.2021.01050
DO - 10.1109/ICCV48922.2021.01050
M3 - 会议稿件
AN - SCOPUS:85121774789
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 10655
EP - 10664
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Y2 - 11 October 2021 through 17 October 2021
ER -