Skip to main navigation Skip to search Skip to main content

HAformer: Semantic fusion of hex machine code and assembly code for cross-architecture binary vulnerability detection

  • Xunzhi Jiang
  • , Shen Wang*
  • , Yuxin Gong
  • , Tingyue Yu
  • , Li Liu
  • , Xiangzhan Yu
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Binary vulnerability detection is a significant area of research in computer security. The existing methods for detecting binary vulnerabilities primarily rely on binary code similarity analysis, detecting vulnerabilities by comparing the similarities embedded in binary codes. Recently, Transformer-based models have achieved significant progress in this field, leveraging their advantage in handling sequential data to better understand the semantics of assembly code. However, to prevent the out-of-vocabulary (OOV) problems, assembly code typically needs to be normalized, which would lose some important numerical and jump information. In this paper, we propose HAformer, a Transformer-based model, which semantically fuses hexadecimal machine codes and assembly codes to extract richer semantic information from binary codes. By incorporating the hexadecimal machine code and a newly designed assembly code normalization method, HAformer can alleviate the problem of numerical information loss caused by traditional assembly code normalization, thereby addressing the issue of OOV. Evaluation results demonstrate that our HAformer outperforms the baseline method in the Recall@1 metric by 16.9%, 25.5% and 19.2% in cross-optimization level, cross-compiler and cross-architecture environments, respectively. In real-world vulnerability detection experiments, HAformer exhibits the highest accuracy.

Original languageEnglish
Article number104029
JournalComputers and Security
Volume145
DOIs
StatePublished - Oct 2024
Externally publishedYes

Keywords

  • Binary code similarity detection
  • Binary similarity analysis
  • Function semantic
  • Transformer
  • Vulnerability detection

Fingerprint

Dive into the research topics of 'HAformer: Semantic fusion of hex machine code and assembly code for cross-architecture binary vulnerability detection'. Together they form a unique fingerprint.

Cite this