Skip to main navigation Skip to search Skip to main content

From Ambiguous Queries to Verifiable Insights: A Task-Driven Framework for LLM-Powered SOC Analysis

  • Huan Zhang
  • , Haiyan Wang
  • , Hao Tan
  • , Liyi Zeng
  • , Jingnan Li
  • , Zhaoquan Gu*
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Pengcheng Laboratory
  • Bank of China

Research output: Contribution to journalArticlepeer-review

Abstract

Security operations centre (SOC) analysts must investigate alerts, correlate threat intelligence and interpret heterogeneous telemetry under tight timing constraints. Although large language models (LLMs) offer strong understanding capabilities, directly applying them to SOC environments remains challenging due to semantic ambiguity in analyst queries, fragmented multisource event data, limited domain-specific reasoning and reliability concerns associated with unconstrained query generation. We present a task-driven knowledge-augmented framework designed to produce verifiable and contextually grounded responses for SOC workflows. The framework integrates four components: (i) contrastive context task recognition that mitigates semantic ambiguity by mapping analyst queries to predefined SOC task types; (ii) expert-guided knowledge augmentation that fuses dense and sparse retrieval to bridge the semantic gap; (iii) schema-aligned event retrieval combined with entity-centric evidence profiling to ensure reliable and secure access to heterogeneous telemetry and (iv) verifiable task-aware generation that constrains model outputs to retrieved knowledge and security events. To assess the framework, we construct a benchmark of 12,500 validated question–answer pairs derived through semiautomated synthesis over more than 34 million real SOC records. Experiments across multiple foundation models demonstrate consistent improvements in relevance and grounding quality. Our results indicate that the four proposed components substantially enhance LLMs' reliability in practical SOC analysis.

Original languageEnglish
JournalCAAI Transactions on Intelligence Technology
DOIs
StateAccepted/In press - 2026
Externally publishedYes

Keywords

  • heterogeneous security telemetry
  • knowledge-augmented retrieval
  • large language models
  • security operations centre
  • task recognition

Fingerprint

Dive into the research topics of 'From Ambiguous Queries to Verifiable Insights: A Task-Driven Framework for LLM-Powered SOC Analysis'. Together they form a unique fingerprint.

Cite this