TY - GEN
T1 - A QoS-Aware Training Framework for ViT Compression, Partition, and Distillation
AU - Lin, Changyao
AU - Li, Chengxiang
AU - Liu, Jie
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In this paper, we jointly optimize compression, partition and distillation for visual Transformer. We analyze the relationship among the three modules and integrate them into a QoS-aware training framework. By coordinating the model compression, edge-cloud partition, and knowledge distillation during training, the model architecture and accuracy are optimized simultaneously. The framework considers the differences in computing power and memory between edge and cloud, and can trade off QoS metrics such as the memory overhead, end-to-end latency, accuracy at multiple granularities.
AB - In this paper, we jointly optimize compression, partition and distillation for visual Transformer. We analyze the relationship among the three modules and integrate them into a QoS-aware training framework. By coordinating the model compression, edge-cloud partition, and knowledge distillation during training, the model architecture and accuracy are optimized simultaneously. The framework considers the differences in computing power and memory between edge and cloud, and can trade off QoS metrics such as the memory overhead, end-to-end latency, accuracy at multiple granularities.
KW - Vision Transformer
KW - knowledge distillation
KW - model compression
KW - model partition
UR - https://www.scopus.com/pages/publications/85206375871
U2 - 10.1109/IWQoS61813.2024.10682862
DO - 10.1109/IWQoS61813.2024.10682862
M3 - 会议稿件
AN - SCOPUS:85206375871
T3 - IEEE International Workshop on Quality of Service, IWQoS
BT - 2024 IEEE/ACM 32nd International Symposium on Quality of Service, IWQoS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024
Y2 - 19 June 2024 through 21 June 2024
ER -