Skip to main navigation Skip to search Skip to main content

Joint Optimization of Model Inferencing and Task Offloading for MEC-Empowered Large Vision Model Services

  • School of Electronics and Information Engineering, Harbin Institute of Technology
  • The Education University of Hong Kong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the rapid advancement of Large Vision Models (LVMs) such as Sora, the initial comprehension of physical laws by large AI models has garnered significant attention, which enables them to interpret and apply physical principles with increasing accuracy and sophistication. Nevertheless, due to resource limitations and delay constraints, traditional cloud-based LVM services often fail to meet the diverse needs of users, particularly in scenarios requiring real-time responsiveness. In this work, we explore the scenario of Mobile Edge Computing (MEC)-empowered LVM services in wireless networks, where heterogeneous LVMs are deployed on both cloud and edge servers, and LVM Users (LUs) can offload computation task to edge servers to reduce delay and energy consumption. In such a scenario, we focus on the joint optimization of model inferencing and task offloading for LUs, aiming to maximize the total service utility, while minimizing delay and energy consumption. First, to characterize the utility of LVM services, we propose a multi-dimensional video quality metric based on real measurements, which incorporates both the prompt-video alignment and the classic video quality indicators. Then, to solve the problem in a decentralized manner, we propose a two-stage solution based on both learning and optimization techniques. In the first stage, we design a reinforcement learning-based Multi-Agent Proximal Policy Optimization (MAPPO) approach to make the real-time model inferencing and task offloading decisions. In the second stage, we employ the optimization-based Sequential Least Squares Programming (SLSQP) to make the efficient resource allocation decisions. Simulation results show that our proposed solution outperforms other benchmarks, and can reduce delay and energy consumption by up to 17.2% and 21.7%, respectively, while increasing service utility by up to 3%.

Original languageEnglish
Title of host publicationINFOCOM 2025 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331543051
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 IEEE Conference on Computer Communications, INFOCOM 2025 - London, United Kingdom
Duration: 19 May 202522 May 2025

Publication series

NameProceedings - IEEE INFOCOM
ISSN (Print)0743-166X

Conference

Conference2025 IEEE Conference on Computer Communications, INFOCOM 2025
Country/TerritoryUnited Kingdom
CityLondon
Period19/05/2522/05/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Large Vision Model
  • Mobile Edge Computing

Fingerprint

Dive into the research topics of 'Joint Optimization of Model Inferencing and Task Offloading for MEC-Empowered Large Vision Model Services'. Together they form a unique fingerprint.

Cite this