Skip to main navigation Skip to search Skip to main content

DATA AUGMENTATION IN TRAINING DEEP LEARNING MODELS FOR MALWARE FAMILY CLASSIFICATION

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the rapid development of deep learning technologies, different deep learning models have been applied to detect and classify malware. When applying deep learning models to classify malware families, a major bottleneck is the lack of enough labeled family samples that are required for training deep learning models. The depth model applied to malware needs a huge number of samples for training. In order to solve this issue, we propose a method for generating malware family samples. We use the Grad-CAM algorithm to find the raw data representing malware features. A new sample is created by inserting them into section gaps and new sections in PE files. The experiment results show that adding the generated samples into training dataset can improve the classification accuracy of deep learning models.

Original languageEnglish
Title of host publicationProceedings of 2021 International Conference on Machine Learning and Cybernetics, ICMLC 2021
PublisherIEEE Computer Society
ISBN (Electronic)9781665466080
DOIs
StatePublished - 2021
Externally publishedYes
Event20th International Conference on Machine Learning and Cybernetics, ICMLC 2021 - Adelaide, United States
Duration: 4 Dec 20215 Dec 2021

Publication series

NameProceedings - International Conference on Machine Learning and Cybernetics
Volume2021-December
ISSN (Print)2160-133X
ISSN (Electronic)2160-1348

Conference

Conference20th International Conference on Machine Learning and Cybernetics, ICMLC 2021
Country/TerritoryUnited States
CityAdelaide
Period4/12/215/12/21

Keywords

  • Data Augmentation
  • Deep Learning
  • Feature Extraction

Fingerprint

Dive into the research topics of 'DATA AUGMENTATION IN TRAINING DEEP LEARNING MODELS FOR MALWARE FAMILY CLASSIFICATION'. Together they form a unique fingerprint.

Cite this