Skip to main navigation Skip to search Skip to main content

QPO: Accelerating Memory-Efficient DNN Training with Quantization and Pipelining

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep neural network (DNN) training demands significant computational power and also necessitates costly device memory for the storage of parameters, gradients, optimizer states, and activations. Activations generally take up the largest portion of device memory, and their usage increases linearly with the mini-batch size and sequence length, which are two main hyper-parameters in training language models. Offloading is one of the widely used memory-efficient techniques by transferring temporarily data from the device memory to CPU memory to save device memory. However, the offloading and uploading operations easily cause significant data transfer overheads. In this paper, we propose QPO (Quantized and Pipelined Offloading), which combines 1) compressing the activation data with low-bit quantization to alleviate the transfer overhead, and 2) pipelining communication tasks of offloading/uploading with computation tasks of feed-forward/backpropagation to reduce the iteration time. Experiments on GPT-2 and LLaMA-2 models using A100 and RTX 3090 GPUs show a speedup of up to 15% while approaching minimal memory requirements over existing offloading approaches.

Original languageEnglish
Title of host publicationProceedings of 2025 IEEE 31st International Conference on Parallel and Distributed Systems, ICPADS 2025
PublisherIEEE Computer Society
ISBN (Electronic)9798331549015
DOIs
StatePublished - 2025
Externally publishedYes
Event31st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2025 - Hefei, China
Duration: 14 Dec 202517 Dec 2025

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
ISSN (Print)1521-9097

Conference

Conference31st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2025
Country/TerritoryChina
CityHefei
Period14/12/2517/12/25

Keywords

  • activation quantization
  • memory compression
  • memory efficiency
  • memory swapping
  • offloading
  • pipelining

Fingerprint

Dive into the research topics of 'QPO: Accelerating Memory-Efficient DNN Training with Quantization and Pipelining'. Together they form a unique fingerprint.

Cite this