Abstract
In many application fields, top-k is an important operation since it returns k most important objects according to a given ranking function. Different from traditional TA algorithms, NRA only requires sequential access to return top-k results so that it can be used in environment where random access is limited or impossible. This paper analyzes the execution behavior of NRA and determines tuple number to scan in increasing and shrinking phase. It is found that in massive data context, NRA needs to maintain large quantity of candidate tuples in increasing phase which affects algorithm efficiency significantly. This paper proposes a novel top-k algorithm TKEP (Top-K with Early Pruning) on massive data which performs early pruning in increasing phase to prune most of candidate tuples. This paper provides mathematical analysis of early pruning and proves its theoretical and practical pruning effect. To the best of our knowledge, it is the first paper to provide early pruning in top-k processing. The extensive experiments show that compared to NRA, TKEP maintains less tuples by a factor of three orders of magnitude, it consumes less memory by a factor of an order of magnitude and TKEP achieves substantial performance speed-up of an order of magnitude.
| Original language | English |
|---|---|
| Pages (from-to) | 1405-1417 |
| Number of pages | 13 |
| Journal | Jisuanji Xuebao/Chinese Journal of Computers |
| Volume | 33 |
| Issue number | 8 |
| DOIs | |
| State | Published - Aug 2010 |
Keywords
- Early pruning
- Massive data
- Top-K with Early Pruning
- Top-k
Fingerprint
Dive into the research topics of 'TKEP: An efficient top-k query processing algorithm on massive data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver