TY - GEN
T1 - A Survey of Deep Model Compression and Acceleration
AU - Zhang, Chong
AU - Liu, Hongwei
AU - Wang, Hongzhi
AU - Wang, Jiaying
AU - Zheng, Sijia
AU - Meng, Xiaoqian
AU - Zhu, Siyan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Recently, Deep neural networks (DNNs) have attained remarkable achievements across numerous visual recognition tasks. Nevertheless, the existing deep neural network models are characterized by high computational costs and substantial memory usage, which pose significant barriers to their deployment in devices with limited memory resources or applications with strict latency requirements. Consequently, model compression and acceleration for deep networks without causing notable degradation in model performance is in urgent need. This paper provides a comprehensive review of the recent techniques employed for compacting and accelerating DNN models. From the perspective of model architecture, the main approaches are compact structure design and neural architecture search. From an algorithmic dimension, methods are briefly categorized into static compression methods and dynamic acceleration methods, specifically covering implementation strategies such as model pruning, parameter quantization, low-rank factorization, and knowledge distillation. For each category, we demonstrate the development of mainstream methods as well as the characteristics and advantages of each method. We also provide insightful analysis about the integration of multiple methods, their advantages and drawbacks.
AB - Recently, Deep neural networks (DNNs) have attained remarkable achievements across numerous visual recognition tasks. Nevertheless, the existing deep neural network models are characterized by high computational costs and substantial memory usage, which pose significant barriers to their deployment in devices with limited memory resources or applications with strict latency requirements. Consequently, model compression and acceleration for deep networks without causing notable degradation in model performance is in urgent need. This paper provides a comprehensive review of the recent techniques employed for compacting and accelerating DNN models. From the perspective of model architecture, the main approaches are compact structure design and neural architecture search. From an algorithmic dimension, methods are briefly categorized into static compression methods and dynamic acceleration methods, specifically covering implementation strategies such as model pruning, parameter quantization, low-rank factorization, and knowledge distillation. For each category, we demonstrate the development of mainstream methods as well as the characteristics and advantages of each method. We also provide insightful analysis about the integration of multiple methods, their advantages and drawbacks.
KW - Deep Learning
KW - Inference Acceleration
KW - Model Compression
UR - https://www.scopus.com/pages/publications/105021219059
U2 - 10.1007/978-981-95-1346-8_6
DO - 10.1007/978-981-95-1346-8_6
M3 - 会议稿件
AN - SCOPUS:105021219059
SN - 9789819513451
T3 - Lecture Notes in Computer Science
SP - 86
EP - 106
BT - Green, Pervasive, and Cloud Computing - 19th International Conference, GPC 2024, Proceedings
A2 - Zhou, Xiaobo
A2 - Yu, Chen
A2 - Guo, Song
A2 - Wang, Jianping
A2 - Song, Xianhua
A2 - Lu, Zeguang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 19th International Conference on Green, Pervasive, and Cloud Computing, GPC 2024
Y2 - 27 September 2024 through 30 September 2024
ER -