Skip to main navigation Skip to search Skip to main content

Hypersistent Sketch: Enhanced Persistence Estimation via Fast Item Separation

  • Lu Cao
  • , Qilong Shi
  • , Weiqiang Xiao
  • , Nianfu Wang
  • , Wenjun Li*
  • , Zhijun Li
  • , Weizhe Zhang*
  • , Mingwei Xu
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Peng Cheng Laboratory
  • Tsinghua University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Efficient data stream processing, particularly for persistence estimation, is crucial in handling high-velocity data streams characterized by skewed distributions of item frequencies. Unlike more straightforward frequency metrics, persistence captures items' recurrence across multiple time windows, requiring nuanced processing approaches. In response, we introduce the Hypersistent Sketch, an algorithm that significantly enhances persistence estimation through innovative filtering techniques. Our design incorporates a Cold Filter to address the skewed nature of data streams where a few high-frequency (hot) items dominate. This filter allows for differential treatment by using smaller counters for most low-frequency (cold) items, thus conservatively allocating memory resources that would otherwise be sized uniformly based on hot items. However, the Cold Filter can reduce throughput due to its segregative processing. To mitigate this, we implement a Burst Filter, which optimizes the processing of hot items. The Burst Filter significantly improves throughput by preventing repeated insertions within a single window - where persistence increases by at most one - and deferring the insertion until the window's end. Comparative evaluations demonstrate that the Hypersistent Sketch outperforms existing solutions like the On-Off Sketch, offering up to 3 times improved throughput while maintaining competitive accuracy and substantially reducing memory usage in handling large-scale data streams.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PublisherIEEE Computer Society
Pages3030-3042
Number of pages13
ISBN (Electronic)9798331536039
DOIs
StatePublished - 2025
Event41st IEEE International Conference on Data Engineering, ICDE 2025 - Hong Kong, China
Duration: 19 May 202523 May 2025

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference41st IEEE International Conference on Data Engineering, ICDE 2025
Country/TerritoryChina
CityHong Kong
Period19/05/2523/05/25

Keywords

  • Approximate algorithm
  • Data stream
  • Persistence estimation
  • Sketch

Fingerprint

Dive into the research topics of 'Hypersistent Sketch: Enhanced Persistence Estimation via Fast Item Separation'. Together they form a unique fingerprint.

Cite this