Skip to main navigation Skip to search Skip to main content

BurstGPT: A Real-World Workload Dataset to Optimize LLM Serving Systems

  • Yuxin Wang
  • , Yuhan Chen
  • , Zeyu Li
  • , Xueze Kang
  • , Yuchu Fang
  • , Yeju Zhou
  • , Yang Zheng
  • , Zhenheng Tang
  • , Xin He
  • , Rui Guo
  • , Xin Wang
  • , Qiang Wang
  • , Amelie Chi Zhou
  • , Xiaowen Chu*
  • *Corresponding author for this work
  • Huawei Hong Kong Research Center
  • The Hong Kong University of Science and Technology (Guangzhou)
  • Huawei Technologies Co., Ltd.
  • Hong Kong University of Science and Technology
  • Hong Kong Baptist University
  • Tsinghua University
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Despite efforts to improve the quality of service (QoS) and throughput in Large Language Model (LLM) serving systems, progress is often limited by the lack of publicly available real-world workloads. Consequently, evaluations usually depend on synthetic or oversimplified load patterns, and systems that appear promising in testing frequently underperform once deployed. This work presents BurstGPT, an LLM serving workload with 10.31 million traces from regional Azure OpenAI GPT services over 213 days. BurstGPT captures LLM serving characteristics from user, model and system perspectives: (1) User request concurrency: burstiness variations of requests in Azure OpenAI GPT services, revealing diversified concurrency patterns in different services and model types. (2) User conversation patterns: counts and intervals within conversations for service optimizations. (3) Model response lengths: auto-regressive serving processes of GPT models, showing statistical relations between requests and their responses. (4) System response failures: failures of conversation and API services, showing intensive resource needs and limited availability of LLM services in Azure. The details of the characteristics can serve multiple purposes in LLM serving optimizations, such as system evaluation and trace provisioning. In our demo evaluation with BurstGPT, frequent variations in BurstGPT reveal declines in efficiency, stability, or reliability in realistic LLM serving. We identify that the generalization of KV cache management, scheduling and disaggregation optimizations can be improved under realistic workload evaluations. BurstGPT is publicly available now at https://github.com/HPMLL/BurstGPT and is widely used to develop prototypes of LLM serving frameworks in the industry.

Original languageEnglish
Title of host publicationKDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages5831-5841
Number of pages11
ISBN (Electronic)9798400714542
DOIs
StatePublished - 3 Aug 2025
Externally publishedYes
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto, Canada
Duration: 3 Aug 20257 Aug 2025

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2
ISSN (Print)2154-817X

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Country/TerritoryCanada
CityToronto
Period3/08/257/08/25

Keywords

  • llm serving
  • system scheduling
  • workload management
  • workload trace

Fingerprint

Dive into the research topics of 'BurstGPT: A Real-World Workload Dataset to Optimize LLM Serving Systems'. Together they form a unique fingerprint.

Cite this