Skip to main navigation Skip to search Skip to main content

CDGM: Controllable Dataset Generation Method for Cybersecurity

  • Yushun Xie
  • , Haiyan Wang
  • , Runnan Tan
  • , Xiangyu Song
  • , Zhaoquan Gu*
  • *Corresponding author for this work
  • University of Electronic Science and Technology of China
  • Peng Cheng Laboratory
  • Guangzhou University
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Cyberattacks can lead to data breaches, service disruptions, and economic losses, and may even threaten national security and social stability. Therefore researchers have proposed various methods based on public datasets to improve the intelligence and automation of cybersecurity defense techniques. However, these public datasets usually have limited coverage of the types of cyberattacks, resulting in the proposed methods being ineffective against attacks not included in the dataset. Meanwhile, cybersecurity defenders often need to study cyberattack scenarios involving specific assets that are usually not represented in public datasets. To address these challenges, we propose a new approach to cybersecurity controlled dataset generation. Our method can reproduce any cyberattack using our four-role architecture, generating customized private attack data that includes specific assets, this capability satisfies the needs of researchers. By integrating the private attack data with a cybersecurity knowledge base derived from open-source datasets, we construct a comprehensive cybersecurity dataset. Extensive experiments demonstrate that the cybersecurity dataset generated by our method is suitable for various common cybersecurity tasks, such as threat hunting, alert analysis, and knowledge reasoning.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 20th International Conference, ADMA 2024, Proceedings
EditorsQuan Z. Sheng, Xuyun Zhang, Jia Wu, Congbo Ma, Gill Dobbie, Jing Jiang, Wei Emma Zhang, Yannis Manolopoulos, Wathiq Mansoor
PublisherSpringer Science and Business Media Deutschland GmbH
Pages238-253
Number of pages16
ISBN (Print)9789819608492
DOIs
StatePublished - 2025
Externally publishedYes
Event20th International Conference on Advanced Data Mining Applications, ADMA 2024 - Sydney, Australia
Duration: 3 Dec 20245 Dec 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15392 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Advanced Data Mining Applications, ADMA 2024
Country/TerritoryAustralia
CitySydney
Period3/12/245/12/24

Keywords

  • Cyberattack
  • Cybersecurity Dataset
  • Data generate

Fingerprint

Dive into the research topics of 'CDGM: Controllable Dataset Generation Method for Cybersecurity'. Together they form a unique fingerprint.

Cite this