Skip to main navigation Skip to search Skip to main content

Efficiently Mining Frequent Itemsets on Massive Data

  • Xixian Han*
  • , Xianmin Liu
  • , Jian Chen
  • , Guojun Lai
  • , Hong Gao
  • , Jianzhong Li
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Frequent itemset mining is an important operation to return all itemsets in the transaction table, which occur as a subset of at least a specified fraction of the transactions. The existing algorithms cannot compute frequent itemsets on massive data efficiently, since they either require multiple-pass scans on the table or construct complex data structures which normally exceed the available memory on massive data. This paper proposes a novel precomputation-based frequent itemset mining (PFIM) algorithm to compute the frequent itemsets quickly on massive data. PFIM treats the transaction table as two parts: the large old table storing historical data and the relatively small new table storing newly generated data. PFIM first pre-constructs the quasi-frequent itemsets on the old table whose supports are above the lower-bound of the practical support level. Given the specified support threshold, PFIM can quickly return the required frequent itemsets on the table by utilizing the quasi-frequent itemsets. Three pruning rules are presented to reduce the size of the involved candidates. An incremental update strategy is devised to efficiently re-construct the quasi-frequent itemsets when the tables are merged. The extensive experimental results, conducted on synthetic and real-life data sets, show that PFIM has a significant advantage over the existing algorithms and runs two orders of magnitude faster than the latest algorithm.

Original languageEnglish
Article number8657696
Pages (from-to)31409-31421
Number of pages13
JournalIEEE Access
Volume7
DOIs
StatePublished - 2019
Externally publishedYes

Keywords

  • Frequent itemset mining
  • PFIM algorithm
  • incremental update
  • massive data
  • pruning rule

Fingerprint

Dive into the research topics of 'Efficiently Mining Frequent Itemsets on Massive Data'. Together they form a unique fingerprint.

Cite this