Skip to main navigation Skip to search Skip to main content

A Survey of Deep Model Compression and Acceleration

  • Chong Zhang
  • , Hongwei Liu
  • , Hongzhi Wang*
  • , Jiaying Wang
  • , Sijia Zheng
  • , Xiaoqian Meng
  • , Siyan Zhu
  • *Corresponding author for this work
  • Faculty of Computing, Harbin Institute of Technology
  • Ltd.
  • North Automatic Control Technology Research Institute

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, Deep neural networks (DNNs) have attained remarkable achievements across numerous visual recognition tasks. Nevertheless, the existing deep neural network models are characterized by high computational costs and substantial memory usage, which pose significant barriers to their deployment in devices with limited memory resources or applications with strict latency requirements. Consequently, model compression and acceleration for deep networks without causing notable degradation in model performance is in urgent need. This paper provides a comprehensive review of the recent techniques employed for compacting and accelerating DNN models. From the perspective of model architecture, the main approaches are compact structure design and neural architecture search. From an algorithmic dimension, methods are briefly categorized into static compression methods and dynamic acceleration methods, specifically covering implementation strategies such as model pruning, parameter quantization, low-rank factorization, and knowledge distillation. For each category, we demonstrate the development of mainstream methods as well as the characteristics and advantages of each method. We also provide insightful analysis about the integration of multiple methods, their advantages and drawbacks.

Original languageEnglish
Title of host publicationGreen, Pervasive, and Cloud Computing - 19th International Conference, GPC 2024, Proceedings
EditorsXiaobo Zhou, Chen Yu, Song Guo, Jianping Wang, Xianhua Song, Zeguang Lu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages86-106
Number of pages21
ISBN (Print)9789819513451
DOIs
StatePublished - 2026
Externally publishedYes
Event19th International Conference on Green, Pervasive, and Cloud Computing, GPC 2024 - Macao, China
Duration: 27 Sep 202430 Sep 2024

Publication series

NameLecture Notes in Computer Science
Volume15225 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Green, Pervasive, and Cloud Computing, GPC 2024
Country/TerritoryChina
CityMacao
Period27/09/2430/09/24

Keywords

  • Deep Learning
  • Inference Acceleration
  • Model Compression

Fingerprint

Dive into the research topics of 'A Survey of Deep Model Compression and Acceleration'. Together they form a unique fingerprint.

Cite this