Skip to main navigation Skip to search Skip to main content

Identifying functions in binary code with reverse extended control flow graphs

  • Jing Qiu*
  • , Xiaohong Su
  • , Peijun Ma
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

In binary code analysis, current function identification approaches are challenged by functions without explicit call sites and handcrafted assembly without standard prologues/epilogues. We propose a new function representation called a reverse extended control flow graph (RECFG) and a RECFG-based method for identifying functions in stripped binary code. A function has at least one return instruction (an instruction that makes the control flow leave a function). Therefore, return instructions are more reliable than the function prologues and epilogues used by traditional methods. We first build RECFGs from any values that can be interpreted as return instructions in a code range. Then, for each independent RECFG, the multiple-decision method chooses a subgraph as the control flow graph of a function. A prototype tool is developed for evaluation on seven open source applications, 138 binaries in MASM32 code examples, and 292 binaries in Windows XP SP3. Experimental results show that the proposed method can identify functions that cannot be identified by current methods with high precision and stable recall.

Original languageEnglish
Pages (from-to)793-820
Number of pages28
JournalJournal of Software Maintenance and Evolution
Volume27
Issue number10
DOIs
StatePublished - 1 Oct 2015
Externally publishedYes

Keywords

  • TOPSIS
  • function identification
  • reverse engineering
  • reverse extended control flow graph
  • static analysis

Fingerprint

Dive into the research topics of 'Identifying functions in binary code with reverse extended control flow graphs'. Together they form a unique fingerprint.

Cite this