Skip to main navigation Skip to search Skip to main content

Distributed processing of regular path queries in RDF graphs

  • Xintong Guo
  • , Hong Gao*
  • , Zhaonian Zou
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

SPARQL 1.1 offers a type of navigational query for RDF systems, called regular path query (RPQ). A regular path query allows for retrieving node pairs with the paths between them satisfying regular expressions. Regular path queries are always difficult to be evaluated efficiently because of the possible large search space. Thus there has been no scalable and practical solution so far. In this paper, we present Leon+, an in-memory distributed framework, to address the RPQ problem in the context of the knowledge graph. To reduce search space and mitigate mounting communication costs, Leon+ takes advantage of join-ahead pruning via a novel RDF summarization technique together with a path partitioning strategy. We also develop a subtle cost model to devise query plans to achieve high efficiency for complex RPQs. As there has been no available RPQ benchmark, we create micro-benchmarks on both synthetic and real-world datasets. A thorough experimental evaluation is presented between our approach and the state-of-the-art RDF stores. The results show that our approach outperforms 5x faster than the competitors on single RPQ. For query workload, it saves up to 1/2 time and 2/3 communication overheads over the baseline method.

Original languageEnglish
Pages (from-to)993-1027
Number of pages35
JournalKnowledge and Information Systems
Volume63
Issue number4
DOIs
StatePublished - Apr 2021

Keywords

  • Graph partitioning
  • Graph summarization
  • Knowledge graph
  • RDF/SPARQL
  • Regular path queries

Fingerprint

Dive into the research topics of 'Distributed processing of regular path queries in RDF graphs'. Together they form a unique fingerprint.

Cite this