Skip to main navigation Skip to search Skip to main content

FlexSP:(1 + ß)-Choice based Flexible Stream Partitioning for Stateful Operators

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Stream partitioning has a fundamental effect on the efficiency of data parallelism in distributed stream processing systems. The skewed and time-varying nature of streaming data makes it challenging to achieve load balancing while minimizing the cost incurred. The requirement of adaptivity further complicates the problem, that the partitioning mechanism should not only be able to capture the changes in workload and adjust itself but also be quite tolerant of the changes because of the lag in statistics. Existing approaches use one-choice or multiple-choice schemes to make tradeoffs between these factors, but they tend to treat them as opposites, which either fails to achieve good load balancing or incurs excessive cost. There is a lack of deeper insight into how partitioning behavior affects load balancing, cost, and adaptivity when the keys have a different number of candidate choices. Also, it requires a flexible partitioning scheme to allow different trade-offs among the three factors for various scenarios. To address the issues mentioned above, we propose a novel (1 + ß)-choice based stream partitioning scheme, which splits ß ?(0, 1) part of keys selectively to have multiple candidate choices. We demonstrate that just splitting ß part of the keys is sufficient to achieve optimal load balancing while minimizing cost and providing the required adaptivity to workload variance. In a new perspective, we analyze the relationship among load balancing, cost, and adaptivity, as the theoretical foundation of getting proper ß and the corresponding number of choices. Experiments on Apache Flink demonstrate that our approach outperforms state-of-the-art solutions, improving throughput by 7.3 × and reducing latency by 85%.

Original languageEnglish
Title of host publication53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
PublisherAssociation for Computing Machinery
Pages732-741
Number of pages10
ISBN (Electronic)9798400708428
DOIs
StatePublished - 12 Aug 2024
Event53rd International Conference on Parallel Processing, ICPP 2024 - Gotland, Sweden
Duration: 12 Aug 202415 Aug 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference53rd International Conference on Parallel Processing, ICPP 2024
Country/TerritorySweden
CityGotland
Period12/08/2415/08/24

Keywords

  • distributed stream processing
  • key splitting
  • stateful operation
  • stream partitioning

Fingerprint

Dive into the research topics of 'FlexSP:(1 + ß)-Choice based Flexible Stream Partitioning for Stateful Operators'. Together they form a unique fingerprint.

Cite this