Skip to main navigation Skip to search Skip to main content

Accelerating Distributed K-FAC with smart parallelism of computing and communication tasks

  • Hong Kong University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Distributed training with synchronous stochastic gradient descent (SGD) on GPU clusters has been widely used to accelerate the training process of deep models. However, SGD only utilizes the first-order gradient in model parameter updates, which may take days or weeks. Recent studies have successfully exploited approximate second-order information to speed up the training process, in which the Kronecker-Factored Approximate Curvature (KFAC) emerges as one of the most efficient approximation algorithms for training deep models. Yet, when leveraging GPU clusters to train models with distributed KFAC (D-KFAC), it incurs extensive computation as well as introduces extra communications during each iteration. In this work, we propose D-KFAC (SPD-KFAC) with smart parallelism of computing and communication tasks to reduce the iteration time. Specifically, 1) we first characterize the performance bottlenecks of D-KFAC, 2) we design and implement a pipelining mechanism for Kronecker factors computation and communication with dynamic tensor fusion, and 3) we develop a load balancing placement for inverting multiple matrices on GPU clusters. We conduct realworld experiments on a 64-GPU cluster with 100Gb/s InfiniBand interconnect. Experimental results show that our proposed SPD-KFAC training scheme can achieve 10%-35% improvement over state-of-the-art algorithms.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 41st International Conference on Distributed Computing Systems, ICDCS 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages550-560
Number of pages11
ISBN (Electronic)9781665445139
DOIs
StatePublished - Jul 2021
Externally publishedYes
Event41st IEEE International Conference on Distributed Computing Systems, ICDCS 2021 - Virtual, Washington, United States
Duration: 7 Jul 202110 Jul 2021

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2021-July

Conference

Conference41st IEEE International Conference on Distributed Computing Systems, ICDCS 2021
Country/TerritoryUnited States
CityVirtual, Washington
Period7/07/2110/07/21

Keywords

  • Distributed Deep Learning
  • K-FAC
  • Load-Balancing
  • Second-Order
  • Smart Parallelism

Fingerprint

Dive into the research topics of 'Accelerating Distributed K-FAC with smart parallelism of computing and communication tasks'. Together they form a unique fingerprint.

Cite this