Skip to main navigation Skip to search Skip to main content

Towards Efficient Random-Order Enumeration for Join Queries

  • Harbin Institute of Technology

Research output: Contribution to journalConference articlepeer-review

Abstract

In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful representation of the whole join result, the result tuples are required to be enumerated in uniformly random order. However, existing studies lack an efficient random-order enumeration algorithm with a worst-case runtime guarantee for (cyclic) join queries. In this paper, we develop an efficient random-order enumeration algorithm for join queries with no large hidden constants in its complexity, achieving expected is the size of the join result. We prove that our algorithm is near-optimal in the worst case, under the combinatorial clique hypothesis. Our algorithm requires no query-specific preprocessing and can be flexibly adapted to many common database indexes with only minor modifications. We also devise non-trivial techniques to speed up enumeration and reduce memory usage, and present an experimental study of their impact on our algorithm. The experimental results show that our algorithm, enhanced with the proposed techniques, significantly outperforms existing state-of-the-art methods.

Original languageEnglish
Pages (from-to)889-901
Number of pages13
JournalProceedings of the VLDB Endowment
Volume19
Issue number5
DOIs
StatePublished - 2026
Event52nd International Conference on Very Large Data Bases, VLDB 2026 - Boston, United States
Duration: 31 Aug 20264 Sep 2026

Fingerprint

Dive into the research topics of 'Towards Efficient Random-Order Enumeration for Join Queries'. Together they form a unique fingerprint.

Cite this