Abstract
As cyberattacks continue to rise in frequency and sophistication, extracting actionable Cyber Threat Intelligence (CTI) from diverse online sources has become critical for proactive threat detection and defense. However, accurately identifying complex entities from lengthy and heterogeneous threat reports remains challenging due to long-range dependencies and domain-specific terminology. To address this, we propose XLNet-CRF, a hybrid framework that combines permutation-based language modeling with structured prediction using Conditional Random Fields (CRF) to enhance Named Entity Recognition (NER) in cybersecurity contexts. XLNet-CRF directly addresses key challenges in CTI-NER by modeling bidirectional dependencies and capturing non-contiguous semantic patterns more effectively than traditional approaches. Comprehensive evaluations on two benchmark cybersecurity corpora validate the efficacy of our approach. On the CTI-Reports dataset, XLNet-CRF achieves a precision of 97.41% and an F1-score of 97.43%; on MalwareTextDB, it attains a precision of 85.33% and an F1-score of 88.65%—significantly surpassing strong BERT-based baselines in both accuracy and robustness.
| Original language | English |
|---|---|
| Article number | 3034 |
| Journal | Electronics (Switzerland) |
| Volume | 14 |
| Issue number | 15 |
| DOIs | |
| State | Published - Aug 2025 |
| Externally published | Yes |
Keywords
- conditional random fields
- cyber security
- cyber threat intelligence
- deep learning
- named entity recognition
- permutation language modeling
Fingerprint
Dive into the research topics of 'XLNet-CRF: Efficient Named Entity Recognition for Cyber Threat Intelligence with Permutation Language Modeling'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver