Skip to main navigation Skip to search Skip to main content

Hierarchical semantic-aware neural code representation

  • School of Computer Science and Technology, Harbin Institute of Technology
  • School of Computing and Information Systems

Research output: Contribution to journalArticlepeer-review

Abstract

Code representation is a fundamental problem in many software engineering tasks. Despite the effort made by many researchers, it is still hard for existing methods to fully extract syntactic, structural and sequential features of source code, which form the hierarchical semantics of the program and are necessary to achieve a deeper code understanding. To alleviate this difficulty, we propose a new supervised approach based on the novel use of Tree-LSTM to incorporate the sequential and the global semantic features of programs explicitly into the representation model. Unlike previous techniques, our proposed model can not only learn low-level syntactic information within each statement but also the high-level semantic information between statements over the constructed semantic graph. Besides, considering that the sequential semantics is also critical for developers to understand the dependency path and data flow transmission, we propose a DFS-based method to generate the topological order of statements being processed, and then feed them as well as their in-neighboring information and syntactic embeddings into the proposed model to learn richer statement-level semantic features. Extensive experiments on multiple program comprehension tasks, e.g., code clone detection, demonstrate that our method achieves promising performance compared with other existing baselines.

Original languageEnglish
Article number111355
JournalJournal of Systems and Software
Volume191
DOIs
StatePublished - Sep 2022
Externally publishedYes

Keywords

  • Clone detection
  • Code representation
  • Deep learning
  • Graph-LSTM
  • Hierarchical semantics
  • Program classification
  • Vulnerability detection

Fingerprint

Dive into the research topics of 'Hierarchical semantic-aware neural code representation'. Together they form a unique fingerprint.

Cite this