Skip to main navigation Skip to search Skip to main content

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

  • Harbin Institute of Technology
  • University of Leeds

Research output: Contribution to journalArticlepeer-review

Abstract

GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites running on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with a modest average execution time increase of 5.1%.

Original languageEnglish
Pages (from-to)2943-2954
Number of pages12
JournalIEEE Transactions on Parallel and Distributed Systems
Volume33
Issue number11
DOIs
StatePublished - 1 Nov 2022
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Dynamic energy optimization
  • GPU
  • multi-objective machine learning
  • online application iteration detection

Fingerprint

Dive into the research topics of 'Dynamic GPU Energy Optimization for Machine Learning Training Workloads'. Together they form a unique fingerprint.

Cite this