Skip to main navigation Skip to search Skip to main content

Data mining in programs: Clustering programs based on structure metrics and execution values

  • School of Computer Science and Technology, Harbin Institute of Technology
  • Harbin University

Research output: Contribution to journalArticlepeer-review

Abstract

Software exists in various control systems, such as security-critical systems and so on. Existing program clustering methods are limited in identifying functional equivalent programs with different syntactic representations. To solve this problem, firstly, a clustering method based on structured metric vectors was proposed to quickly identify structurally similar programs from a large number of existing programs. Next, a clustering method based on similar execution value sequences was proposed, to accurately identify the functional equivalent programs with code variations. This approach has been applied in automatic program repair, to identify sample programs from a large pool of template programs. The average purity value is 0.95576 and the average entropy is 0.15497. This means that the clustering partition is consistent with the expected partition.

Original languageEnglish
Pages (from-to)48-63
Number of pages16
JournalInternational Journal of Data Warehousing and Mining
Volume16
Issue number2
DOIs
StatePublished - 1 Apr 2020
Externally publishedYes

Keywords

  • Clustering
  • Data mining
  • Program repair
  • Structural metrics
  • Value sequence

Fingerprint

Dive into the research topics of 'Data mining in programs: Clustering programs based on structure metrics and execution values'. Together they form a unique fingerprint.

Cite this