Skip to main navigation Skip to search Skip to main content

Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models

  • Shuzheng Gao
  • , Hongyu Zhang
  • , Cuiyun Gao*
  • , Chaozheng Wang
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Chongqing University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Previous research on code intelligence usually trains a deep learning model on a fixed dataset in an offline manner. However, in real-world scenarios, new code repositories emerge incessantly, and the carried new knowledge is beneficial for providing up-to-date code intelligence services to developers. In this paper, we aim at the following problem: How to enable code intelligence models to continually learn from ever-increasing data? One major challenge here is catastrophic forgetting, meaning that the model can easily forget knowledge learned from previous datasets when learning from the new dataset. To tackle this challenge, we propose REPEAT, a novel method for continual learning of code intelligence models. Specifically, REPEAT addresses the catastrophic forgetting problem with representative exemplars replay and adaptive parameter regularization. The representative exemplars replay component selects informative and diverse exemplars in each dataset and uses them to re-train model periodically. The adaptive parameter regularization component recognizes important parameters in the model and adaptively penalizes their changes to preserve the knowledge learned before. We evaluate the proposed approach on three code intelligence tasks including code summarization, software vulnerability detection, and code clone detection. Extensive experiments demonstrate that REPEAT consistently outperforms baseline methods on all tasks. For example, REPEAT improves the conventional fine-tuning method by 1.22, 5.61, and 1.72 on code summarization, vulnerability detection and clone detection, respectively.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/ACM 45th International Conference on Software Engineering, ICSE 2023
PublisherIEEE Computer Society
Pages30-42
Number of pages13
ISBN (Electronic)9781665457019
DOIs
StatePublished - 26 Jul 2023
Externally publishedYes
Event45th IEEE/ACM International Conference on Software Engineering, ICSE 2023 - Melbourne, Australia
Duration: 15 May 202316 May 2023

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

Conference45th IEEE/ACM International Conference on Software Engineering, ICSE 2023
Country/TerritoryAustralia
CityMelbourne
Period15/05/2316/05/23

Fingerprint

Dive into the research topics of 'Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models'. Together they form a unique fingerprint.

Cite this