Abstract
The powerfulness of LLMS indicates that deploying various LLMS with different scales and architectures on end, edge, and cloud to satisfy different requirements and adaptive heterogeneous hardware is the critical way to achieve ubiquitous intel-ligence for 6G. However, the massive parameters of LLMS poses significant challenges in deploying them on edge servers due to high computational and storage demands. Considering that the sparse activation in Mixture of Experts (MoE) is effective on scalable and dynamic allocation of computational and communications resources at the edge, this article proposes a novel MoE-empowered col-laborative deployment framework for edge LLMS, denoted as CoEL. This framework fully leverages the properties of MoE architecture and encom-passes three key aspects: model quantization, intra-server and inter-server cooperation, and token pruning and fusion. The CoEL begins with quantizing experts based on their importance and popularity, assigning different bit widths to different experts. Then, considering the heterogeneous resources of edge servers and model deployment requirements, a multi-dimensional collaborative deployment strat-egy is proposed. This strategy employs intra-server cooperation if the compressed model can be deployed on a single edge server; otherwise, it trig-gers inter-server cooperation and deploys experts across multiple edge servers distributed. Additionally, to minimize data transmission delays between servers, a token compression approach is applied. Finally, given the dynamic of network topology, resource status, and user requirements, the deployment strategies are regularly updated to maintain its relevance and effectiveness. This article also delineates the challenges and potential research directions for the deployment of edge LLMS.
| Original language | English |
|---|---|
| Pages (from-to) | 164-171 |
| Number of pages | 8 |
| Journal | IEEE Communications Magazine |
| Volume | 63 |
| Issue number | 12 |
| DOIs | |
| State | Published - 2025 |
Fingerprint
Dive into the research topics of 'The MoE-Empowered Edge LLMS Deployment: Architecture, Challenges, and Opportunities'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver