TY - GEN
T1 - Malicious Domain Detection on Out-of-Distribution Gray Data through Graph Contrastive Learning with Structure Aggregation
AU - Gu, Hongjie
AU - He, Daojing
AU - Zhou, Xun
N1 - Publisher Copyright:
© 2026 Owner/Author.
PY - 2026/4/20
Y1 - 2026/4/20
N2 - Graph-based threat detection methods model Indicators of Compromise (IoC) using heterogeneous graphs and train node classifiers to identify malicious domains. Despite their promising performance, these approaches still face two major challenges. Firstly, the high cost of node annotation leads to a lack of evaluation on extensive gray data (unlabeled data). Secondly, the previous observations reveal a significant distribution shift in the Domain Maliciousness Graph (DMG), where structural differences between labeled and unlabeled domains hinder model performance. Existing graph learning methods have not yet considered both of these challenges simultaneously. To fill the gap, we frame the problem as semi-supervised graph node classification under out-of-distribution (OOD) constraints. We introduce graph aggregative contrastive learning (GRAVEL), which leverages the inherent structure of DMG to enhance detection performance on OOD unlabeled domains. GRAVEL is pre-trained end-to-end on abundant in-distribution malicious and benign samples, then fine-tuned with scarce OOD malicious data via mixup. During pre-training, label propagation seeds pseudo-labels, and a label-guided aggregation classifier is used to warm up the model, after which multi-view contrastive learning sharpens features for unlabeled domains. Extensive industrial evaluations demonstrate that GRAVEL improves F1 by 5-20% across diverse benchmarks for OOD malicious domain detection, consistently outperforming state-of-the-art baselines.
AB - Graph-based threat detection methods model Indicators of Compromise (IoC) using heterogeneous graphs and train node classifiers to identify malicious domains. Despite their promising performance, these approaches still face two major challenges. Firstly, the high cost of node annotation leads to a lack of evaluation on extensive gray data (unlabeled data). Secondly, the previous observations reveal a significant distribution shift in the Domain Maliciousness Graph (DMG), where structural differences between labeled and unlabeled domains hinder model performance. Existing graph learning methods have not yet considered both of these challenges simultaneously. To fill the gap, we frame the problem as semi-supervised graph node classification under out-of-distribution (OOD) constraints. We introduce graph aggregative contrastive learning (GRAVEL), which leverages the inherent structure of DMG to enhance detection performance on OOD unlabeled domains. GRAVEL is pre-trained end-to-end on abundant in-distribution malicious and benign samples, then fine-tuned with scarce OOD malicious data via mixup. During pre-training, label propagation seeds pseudo-labels, and a label-guided aggregation classifier is used to warm up the model, after which multi-view contrastive learning sharpens features for unlabeled domains. Extensive industrial evaluations demonstrate that GRAVEL improves F1 by 5-20% across diverse benchmarks for OOD malicious domain detection, consistently outperforming state-of-the-art baselines.
KW - heterogeneous graph
KW - indicator of compromise
KW - malicious domain detection
KW - out-of-distribution
UR - https://www.scopus.com/pages/publications/105038107776
U2 - 10.1145/3770854.3780237
DO - 10.1145/3770854.3780237
M3 - 会议稿件
AN - SCOPUS:105038107776
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 324
EP - 335
BT - KDD 2026 - Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1
PB - Association for Computing Machinery
T2 - 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, KDD 2026
Y2 - 9 August 2026 through 13 August 2026
ER -