Skip to main navigation Skip to search Skip to main content

An efficient pruning strategy for approximate string matching over suffix tree

  • Huan Hu
  • , Hongzhi Wang*
  • , Jianzhong Li
  • , Hong Gao
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Approximate string matching over suffix tree with depth-first search (ASM_ST_DFS), a classical algorithm in the field of approximate string matching, was originally proposed by Ricardo A. Baeza-Yates and Gaston H. Gonnet in 1990. The algorithm is one of the most excellent algorithms for approximate string matching if combined with other indexing techniques. However, its time complexity is sensitive to the length of pattern string because it searches m+ k characters on each path from the root before backtracking. In this paper, we propose an efficient pruning strategy to solve this problem. We prove its correctness and efficiency in theory. Particularly, we proved that if the pruning strategy is adopted, it averagely searches O(k) characters on each path before backtracking instead of O(m). Considering each internal node of suffix tree has multiple branches, the pruning strategy should work very well. We also experimentally show that when k is much smaller than m, the efficiency improves hundreds of times, and when k is not much smaller than m, it is still several times faster. This is the first paper that tries to solve the backtracking problem of ASM_ST_DFS in both theory and practice.

Original languageEnglish
Pages (from-to)121-141
Number of pages21
JournalKnowledge and Information Systems
Volume49
Issue number1
DOIs
StatePublished - 1 Oct 2016
Externally publishedYes

Keywords

  • Approximate string matching
  • Backtracking
  • Bit-parallelism
  • Depth-first
  • Dynamic programming
  • Suffix tree

Fingerprint

Dive into the research topics of 'An efficient pruning strategy for approximate string matching over suffix tree'. Together they form a unique fingerprint.

Cite this