Skip to main navigation Skip to search Skip to main content

COS: Cross-Processor Operator Scheduling for Multi-Tenant Deep Learning Inference

  • Changyao Lin
  • , Jie Liu*
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-tenant inference, as a prevalent inference paradigm nowadays, requires deploying multiple deep learning models on the hardware platform to concurrently process inference tasks. Modern platforms are typically equipped with various heterogeneous processors, such as CPU-GPU platform. To reduce resource contention and improve Quality of Service (QoS) in the multi-tenant scenario, existing work has studied cross-processor inference at the model- and layer-level. However, coarse-grained scheduling cannot flexibly account for subtle resource fluctuations, which may lead to task blockages and incur significant processor switching overheads. Such work usually requires extensive modification and retraining of the models. Therefore, we propose a finer-grained operator-level cross-processor scheduling framework COS, which can more precisely divide the computational workloads and switching overheads for the tenants, without modifying or retraining. We introduce a novel intermediate representation to abstract and simplify the scheduling problem, and propose an efficient two-phase search algorithm. COS is automated and easy-to-scale, through experiments on various heterogeneous hardware platforms and models, we demonstrate that COS is more flexible and effective than layer-level scheduling, and achieves higher throughput than single-processor processing in the multi-tenant scenario. Furthermore, COS is an offline optimization method, and its overhead is highly acceptable.

Original languageEnglish
Title of host publication2024 IEEE/ACM 32nd International Symposium on Quality of Service, IWQoS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350350128
DOIs
StatePublished - 2024
Externally publishedYes
Event32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024 - Guangzhou, China
Duration: 19 Jun 202421 Jun 2024

Publication series

NameIEEE International Workshop on Quality of Service, IWQoS
ISSN (Print)1548-615X

Conference

Conference32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024
Country/TerritoryChina
CityGuangzhou
Period19/06/2421/06/24

Keywords

  • cross-processor parallelism
  • multi-tenant deep learning
  • operator scheduling
  • reinforcement learning

Fingerprint

Dive into the research topics of 'COS: Cross-Processor Operator Scheduling for Multi-Tenant Deep Learning Inference'. Together they form a unique fingerprint.

Cite this