Skip to main navigation Skip to search Skip to main content

ScalPipe: Scalable Collaborative Pipeline Inference for Distributed Heterogeneous Devices

  • Nianfu Wang
  • , Wanyou Wang
  • , Xiaoxiong Zhong
  • , Jingyu Liu
  • , Gaotao Shi
  • , Zhijun Li*
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Peng Cheng Laboratory
  • Faculty of Computing, Harbin Institute of Technology
  • Wuhan Institute of Technology
  • Tianjin University

Research output: Contribution to journalArticlepeer-review

Abstract

Edge devices generate vast amounts of data, which are transmitted to the remote cloud for inference—a process that can lead to excessive network load and privacy risks. Edge collaborative inference has emerged as a promising solution to these challenges. However, existing collaborative inference methods remain constrained, particularly by transmission costs, limited flexibility in model partitioning, and redundant computation. These issues constrain the efficient utilization of edge resources and limit overall system performance. To address these limitations, we adopt scalable pipelines in the context of resource and model orchestration, enabling dynamic adaptation to varying workloads and improving resource utilization efficiency. Specifically, we propose ScalPipe, a collaborative pipeline inference framework that enables scalable orchestration of resources and model partitions for efficient edge inference. To better accommodate edge devices, ScalPipe employs a lightweight customized heuristic algorithm for resource-adaptive model partitioning. A fine-tuning algorithm dynamically adjusts the scheduling strategy when monitored inference times deviate beyond a predefined threshold during inference. We provide theoretical analysis to establish performance bounds and computational complexity of the proposed algorithms. Comprehensive experiments in heterogeneous environments demonstrate that ScalPipe consistently surpasses state-of-the-art methods across diverse model architectures and evaluation metrics. ScalPipe reduces average inference latency by 20%-40% while achieving over 90% resource utilization, delivering a significant boost in overall performance.

Original languageEnglish
JournalIEEE Transactions on Mobile Computing
DOIs
StateAccepted/In press - 2026
Externally publishedYes

Keywords

  • Edge computing
  • collaborative inference
  • model split
  • resource scheduling

Fingerprint

Dive into the research topics of 'ScalPipe: Scalable Collaborative Pipeline Inference for Distributed Heterogeneous Devices'. Together they form a unique fingerprint.

Cite this