Skip to main navigation Skip to search Skip to main content

Query-aware locality-sensitive hashing scheme for lp norm

  • Qiang Huang
  • , Jianlin Feng*
  • , Qiong Fang
  • , Wilfred Ng
  • , Wei Wang
  • *Corresponding author for this work
  • Sun Yat-Sen University
  • South China University of Technology
  • Hong Kong University of Science and Technology
  • University of New South Wales

Research output: Contribution to journalArticlepeer-review

Abstract

The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes to tackle the c-ANN search problem. Traditionally, LSH functions are constructed in a query-oblivious manner, in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c≥ 2. In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the “anchor” for bucket partition. Accordingly, a query-aware LSH function under a specific lp norm with p∈ (0 , 2 ] is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. The query-aware bucket partitioning strategy can be easily implemented so that query performance is guaranteed. For each lp norm (p∈ (0 , 2 ]) , based on the corresponding p-stable distribution, we propose a novel LSH scheme named query-aware LSH (QALSH) for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c> 1. In addition, we propose a heuristic variant named QALSH+ to improve the scalability of QALSH. Extensive experiments show that QALSH and QALSH+ outperform the state-of-the-art schemes, especially in high-dimensional space. Specifically, by using a ratio c< 2 , QALSH can achieve much better query quality.

Original languageEnglish
Pages (from-to)683-708
Number of pages26
JournalVLDB Journal
Volume26
Issue number5
DOIs
StatePublished - 1 Oct 2017
Externally publishedYes

Keywords

  • Locality-sensitive hashing
  • Nearest neighbor search
  • Query-aware
  • l Norm
  • p-Stable distributions

Fingerprint

Dive into the research topics of 'Query-aware locality-sensitive hashing scheme for lp norm'. Together they form a unique fingerprint.

Cite this