Skip to main navigation Skip to search Skip to main content

XLNet-CRF: Efficient Named Entity Recognition for Cyber Threat Intelligence with Permutation Language Modeling

  • School of Computer Science and Technology, Harbin Institute of Technology
  • Shandong Key Laboratory of Industrial Network Security
  • Harbin Institute of Technology Weihai
  • Weihai Cyberguard Technologies Co. Ltd

Research output: Contribution to journalArticlepeer-review

Abstract

As cyberattacks continue to rise in frequency and sophistication, extracting actionable Cyber Threat Intelligence (CTI) from diverse online sources has become critical for proactive threat detection and defense. However, accurately identifying complex entities from lengthy and heterogeneous threat reports remains challenging due to long-range dependencies and domain-specific terminology. To address this, we propose XLNet-CRF, a hybrid framework that combines permutation-based language modeling with structured prediction using Conditional Random Fields (CRF) to enhance Named Entity Recognition (NER) in cybersecurity contexts. XLNet-CRF directly addresses key challenges in CTI-NER by modeling bidirectional dependencies and capturing non-contiguous semantic patterns more effectively than traditional approaches. Comprehensive evaluations on two benchmark cybersecurity corpora validate the efficacy of our approach. On the CTI-Reports dataset, XLNet-CRF achieves a precision of 97.41% and an F1-score of 97.43%; on MalwareTextDB, it attains a precision of 85.33% and an F1-score of 88.65%—significantly surpassing strong BERT-based baselines in both accuracy and robustness.

Original languageEnglish
Article number3034
JournalElectronics (Switzerland)
Volume14
Issue number15
DOIs
StatePublished - Aug 2025
Externally publishedYes

Keywords

  • conditional random fields
  • cyber security
  • cyber threat intelligence
  • deep learning
  • named entity recognition
  • permutation language modeling

Fingerprint

Dive into the research topics of 'XLNet-CRF: Efficient Named Entity Recognition for Cyber Threat Intelligence with Permutation Language Modeling'. Together they form a unique fingerprint.

Cite this