Skip to main navigation Skip to search Skip to main content

The MoE-Empowered Edge LLMS Deployment: Architecture, Challenges, and Opportunities

  • Ning Li*
  • , Song Guo
  • , Tuo Zhang
  • , Muqing Li
  • , Zicong Hong
  • , Qihua Zhou
  • , Xin Yuan
  • , Haijun Zhang
  • *Corresponding author for this work
  • Hong Kong University of Science and Technology
  • Harbin Institute of Technology
  • Shenzhen University
  • University of Science and Technology Beijing

Research output: Contribution to journalArticlepeer-review

Abstract

The powerfulness of LLMS indicates that deploying various LLMS with different scales and architectures on end, edge, and cloud to satisfy different requirements and adaptive heterogeneous hardware is the critical way to achieve ubiquitous intel-ligence for 6G. However, the massive parameters of LLMS poses significant challenges in deploying them on edge servers due to high computational and storage demands. Considering that the sparse activation in Mixture of Experts (MoE) is effective on scalable and dynamic allocation of computational and communications resources at the edge, this article proposes a novel MoE-empowered col-laborative deployment framework for edge LLMS, denoted as CoEL. This framework fully leverages the properties of MoE architecture and encom-passes three key aspects: model quantization, intra-server and inter-server cooperation, and token pruning and fusion. The CoEL begins with quantizing experts based on their importance and popularity, assigning different bit widths to different experts. Then, considering the heterogeneous resources of edge servers and model deployment requirements, a multi-dimensional collaborative deployment strat-egy is proposed. This strategy employs intra-server cooperation if the compressed model can be deployed on a single edge server; otherwise, it trig-gers inter-server cooperation and deploys experts across multiple edge servers distributed. Additionally, to minimize data transmission delays between servers, a token compression approach is applied. Finally, given the dynamic of network topology, resource status, and user requirements, the deployment strategies are regularly updated to maintain its relevance and effectiveness. This article also delineates the challenges and potential research directions for the deployment of edge LLMS.

Original languageEnglish
Pages (from-to)164-171
Number of pages8
JournalIEEE Communications Magazine
Volume63
Issue number12
DOIs
StatePublished - 2025

Fingerprint

Dive into the research topics of 'The MoE-Empowered Edge LLMS Deployment: Architecture, Challenges, and Opportunities'. Together they form a unique fingerprint.

Cite this