Skip to main navigation Skip to search Skip to main content

DNN Real-Time Collaborative Inference Acceleration with Mobile Edge Computing

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The collaborative inference approach splits the Deep Neural Networks (DNNs) model into two parts. It runs collaboratively on the end device and cloud server to minimize inference latency and protect data privacy, especially in the 5G era. The scheme of DNN model partitioning depends on the network bandwidth size. However, in the context of dynamic mobile networks, resource-constrained devices cannot efficiently execute complex model partitioning algorithms to obtain optimal partitioning in real-time. In this paper, to overcome this challenge, we first formulate the model partitioning problem as a Min-cut problem to seek the optimal partition. Second, we propose a Collaborative Inference method based on model Compression named CIC. CIC enhances the efficiency of the execution of model partitioning algorithms on resource-constrained end devices by reducing the algorithm's complexity. CIC generates a splitting model based on the inherent characteristics of the DNN model and the platform resources. The splitting models are independent of the network environment, generated offline, and constantly used in the current environment. CIC has excellent compressibility, and even DNN models with hundreds of layers can be rapidly partitioned on resource-constrained devices. Experimental results show that our method is significantly more effective than existing solutions, speeding up model partitioning decision time by up to 100x, reducing inference latency by up to 2.6x, and increasing throughput by up to 3.3x in the best case.

Original languageEnglish
Title of host publication2022 International Joint Conference on Neural Networks, IJCNN 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728186719
DOIs
StatePublished - 2022
Externally publishedYes
Event2022 International Joint Conference on Neural Networks, IJCNN 2022 - Padua, Italy
Duration: 18 Jul 202223 Jul 2022

Publication series

NameProceedings of the International Joint Conference on Neural Networks
ISSN (Print)2161-4393
ISSN (Electronic)2161-4407

Conference

Conference2022 International Joint Conference on Neural Networks, IJCNN 2022
Country/TerritoryItaly
CityPadua
Period18/07/2223/07/22

Keywords

  • Collaborative inference
  • DNN model partitioning
  • data privacy
  • edge computing
  • inference acceleration

Fingerprint

Dive into the research topics of 'DNN Real-Time Collaborative Inference Acceleration with Mobile Edge Computing'. Together they form a unique fingerprint.

Cite this