Skip to main navigation Skip to search Skip to main content

A QoS-Aware Training Framework for ViT Compression, Partition, and Distillation

  • Changyao Lin*
  • , Chengxiang Li
  • , Jie Liu
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Huawei Technologies Co., Ltd.
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we jointly optimize compression, partition and distillation for visual Transformer. We analyze the relationship among the three modules and integrate them into a QoS-aware training framework. By coordinating the model compression, edge-cloud partition, and knowledge distillation during training, the model architecture and accuracy are optimized simultaneously. The framework considers the differences in computing power and memory between edge and cloud, and can trade off QoS metrics such as the memory overhead, end-to-end latency, accuracy at multiple granularities.

Original languageEnglish
Title of host publication2024 IEEE/ACM 32nd International Symposium on Quality of Service, IWQoS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350350128
DOIs
StatePublished - 2024
Externally publishedYes
Event32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024 - Guangzhou, China
Duration: 19 Jun 202421 Jun 2024

Publication series

NameIEEE International Workshop on Quality of Service, IWQoS
ISSN (Print)1548-615X

Conference

Conference32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024
Country/TerritoryChina
CityGuangzhou
Period19/06/2421/06/24

Keywords

  • Vision Transformer
  • knowledge distillation
  • model compression
  • model partition

Fingerprint

Dive into the research topics of 'A QoS-Aware Training Framework for ViT Compression, Partition, and Distillation'. Together they form a unique fingerprint.

Cite this