Skip to main navigation Skip to search Skip to main content

SepJoin: A Distributed Stream Join System with Low Latency and High Throughput

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the field of real-time analytics, stream joins are the basis for complex queries and greatly affect system performance. In order to satisfy the real-time requirements of streaming applications, the system imposes high requirements on the latency and throughput of the stream join operator. In this paper, we model the latency and throughput of distributed stream join systems based on queuing theory. Based on the analysis of this model, we demonstrate the impact of indexing-related overhead on the latency and throughput of stream join systems and propose a new distributed stream join system, SepJoin, which is oriented to the hash join problem. SepJoin reduces the number of tuples stored in each processing unit belonging to each input stream by designing a novel partitioning scheme that uses as many processing units as possible to store tuples belonging to each input stream, thereby reducing the index-related overhead of each processing unit when performing join operations and ultimately achieving performance benefits in terms of latency and throughput. We provide both theoretical analysis and extensive experimental evaluations to evaluate the processing latency and max throughput of SepJoin.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 28th International Conference on Parallel and Distributed Systems, ICPADS 2022
PublisherIEEE Computer Society
Pages633-640
Number of pages8
ISBN (Electronic)9781665473156
DOIs
StatePublished - 2023
Event28th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2022 - Nanjing, China
Duration: 10 Jan 202312 Jan 2023

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume2023-January
ISSN (Print)1521-9097

Conference

Conference28th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2022
Country/TerritoryChina
CityNanjing
Period10/01/2312/01/23

Keywords

  • big data
  • distributed stream join system
  • partitioning scheme
  • queuing theory

Fingerprint

Dive into the research topics of 'SepJoin: A Distributed Stream Join System with Low Latency and High Throughput'. Together they form a unique fingerprint.

Cite this