Skip to main navigation Skip to search Skip to main content

BirdMoE: Reducing Communication Costs for Mixture-of-Experts Training Using Load-Aware Bi-random Quantization

  • Donglei Wu
  • , Weihao Yang
  • , Xiangyu Zou
  • , Jinda Jia
  • , Dingwen Tao
  • , Wen Xia*
  • , Zhihong Tian*
  • *Corresponding author for this work
  • Guangzhou University
  • Guangdong Key Laboratory of Industrial Control System Security
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Indiana University Bloomington
  • CAS - Institute of Computing Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Mixture-of-Experts (MoE) model parallelism is prevalent in training Large Language Models (e.g., ChatGPT). However, the intensive all-to-all collective communication of the MoE layer's intermediate computing results substantially degrades MoE training efficiency. In this paper, we propose BirdMoE, a novel load-aware communication compression technique with Bi-random quantization for MoE training with two core modules. Specifically, BirdMoE employs a lightweight Random Quantization (RQ) with expectation invariance property to efficiently map the floating-point intermediate computing results into integers while maintaining the MoE training quality. Additionally, BirdMoE utilizes a Mixed Precision (MP) strategy to dynamically balance the communication loads among expert nodes, significantly improving all-to-all communication efficiency for the MoE training system. Experiments on four typical MoE training tasks demonstrate that BirdMoE achieves higher 4.06 × -10.44 × total communication compression ratios and 1.18 × -5.27 × training speedup compared with the state-of-the-art compression techniques while maintaining the MoE training quality.

Original languageEnglish
Title of host publication2025 62nd ACM/IEEE Design Automation Conference, DAC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331503048
DOIs
StatePublished - 2025
Externally publishedYes
Event62nd ACM/IEEE Design Automation Conference, DAC 2025 - San Francisco, United States
Duration: 22 Jun 202525 Jun 2025

Publication series

NameProceedings - Design Automation Conference
ISSN (Print)0738-100X

Conference

Conference62nd ACM/IEEE Design Automation Conference, DAC 2025
Country/TerritoryUnited States
CitySan Francisco
Period22/06/2525/06/25

Keywords

  • Mixture-of-Experts training
  • communication -Mixture-of-Experts training
  • communication compression
  • load balance

Fingerprint

Dive into the research topics of 'BirdMoE: Reducing Communication Costs for Mixture-of-Experts Training Using Load-Aware Bi-random Quantization'. Together they form a unique fingerprint.

Cite this