Skip to main navigation Skip to search Skip to main content

Pattern matching for super large patterns set

  • Tian Long Yang*
  • , Hong Li Zhang
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

The current string matching algorithms nearly can not afford the burden of large memory demand when the patters amount increases dramatically. Matching automaton can not be established at all when the amount of patterns is at least tens of thousands. We present a solution to the problem of super large scale patterns matching (SLSPM). In our design, a matching trie is divided into one block matching trie and many general character matching tries if possible. During a block matching procedure our block matching automaton (trie) does not read the current state first. Instead, the automaton first reads the current text block symbol and decides whether it will be matched or not by a hash function. Then, the automaton looks for the current state in the states set in which all the states recognize the same current text block symbol. After the current state is found the automaton continues to read the next text block symbol. The theoretical analysis shows that under the worst case the proposed algorithm takes O(n) time approximately, where n is the length of the text. The experiment results show that our design matches only a little faster than the advanced Aho-Corasick because in the advanced Aho-Corasick the entire possible transition information has been stored. The advantage of SLSPM is that under software environment SLSPM is not slower than AC during the matching procedure, and also at least tens of thousands patterns can be loaded into the hybrid automatons of SLSPM so that it can be used well for super large scale patters matching environment.

Original languageEnglish
Pages (from-to)1147-1158
Number of pages12
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume37
Issue number5
DOIs
StatePublished - May 2014

Keywords

  • Algorithm
  • Hybrid automaton
  • Information security
  • Network security
  • String matching
  • Super large scale patterns matching

Fingerprint

Dive into the research topics of 'Pattern matching for super large patterns set'. Together they form a unique fingerprint.

Cite this