TY - GEN
T1 - Efficient Skyline Frequent-Utility Itemset Mining Algorithm on Massive Data (Extended abstract)
AU - He, Jingxuan
AU - Han, Xixian
AU - Wan, Xiaolong
AU - Wang, Jinbao
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Itemset mining is a crucial technology for extracting interesting patterns that meet predefined thresholds from transaction databases, such as frequent itemset mining (FIM) and high-utility itemset mining (HUIM). Despite their significance, few studies have explored the simultaneous consideration of both support and utility. This gap arises from the theoretical complexity of integrating these dimensions and the practical difficulty of setting appropriate thresholds for both. To overcome this limitation, we introduce Skyline frequent-utility itemset mining (SFUIM), a method that examines frequent and high-utility itemsets without requiring predefined thresholds. Nevertheless, SFUIM faces significant challenges due to its expansive search space and intensive computational requirements. In this paper, we propose a PSI algorithm and its enhanced version PSI*, which confines calculations to specific partitions by prefix-based partitioning. Experiments demonstrate that PSI* outperforms state-of-the-art methods, especially on large-scale datasets.
AB - Itemset mining is a crucial technology for extracting interesting patterns that meet predefined thresholds from transaction databases, such as frequent itemset mining (FIM) and high-utility itemset mining (HUIM). Despite their significance, few studies have explored the simultaneous consideration of both support and utility. This gap arises from the theoretical complexity of integrating these dimensions and the practical difficulty of setting appropriate thresholds for both. To overcome this limitation, we introduce Skyline frequent-utility itemset mining (SFUIM), a method that examines frequent and high-utility itemsets without requiring predefined thresholds. Nevertheless, SFUIM faces significant challenges due to its expansive search space and intensive computational requirements. In this paper, we propose a PSI algorithm and its enhanced version PSI*, which confines calculations to specific partitions by prefix-based partitioning. Experiments demonstrate that PSI* outperforms state-of-the-art methods, especially on large-scale datasets.
KW - Frequent-utility itemset
KW - large-scale data
KW - skyline
UR - https://www.scopus.com/pages/publications/105015405025
U2 - 10.1109/ICDE65448.2025.00404
DO - 10.1109/ICDE65448.2025.00404
M3 - 会议稿件
AN - SCOPUS:105015405025
T3 - Proceedings - International Conference on Data Engineering
SP - 4750
EP - 4751
BT - Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PB - IEEE Computer Society
T2 - 41st IEEE International Conference on Data Engineering, ICDE 2025
Y2 - 19 May 2025 through 23 May 2025
ER -