TY - GEN
T1 - A Systematic Evaluation of Large Code Models in API Suggestion
T2 - 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
AU - Wang, Chaozheng
AU - Gao, Shuzheng
AU - Gao, Cuiyun
AU - Wang, Wenxuan
AU - Chong, Chun Yong
AU - Gao, Shan
AU - Lyu, Michael R.
N1 - Publisher Copyright:
© 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/10/27
Y1 - 2024/10/27
N2 - API suggestion is a critical task in modern software development, assisting programmers by predicting and recommending third-party APIs based on the current context. Recent advancements in large code models (LCMs) have shown promise in the API suggestion task. However, they mainly focus on suggesting which APIs to use, ignoring that programmers may demand more assistance while using APIs in practice including when to use the suggested APIs and how to use the APIs. To mitigate the gap, we conduct a systematic evaluation of LCMs for the API suggestion task in the paper.To facilitate our investigation, we first build a benchmark that contains a diverse collection of code snippets, covering 176 APIs used in 853 popular Java projects. Three distinct scenarios in the API suggestion task are then considered for evaluation, including (1) "when to use", which aims at determining the desired position and timing for API usage; (2) "which to use", which aims at identifying the appropriate API from a given library; and (3) "how to use", which aims at predicting the arguments for a given API. The consideration of the three scenarios allows for a comprehensive assessment of LCMs' capabilities in suggesting APIs for developers. During the evaluation, we choose nine popular LCMs with varying model sizes for the three scenarios. We also perform an in-depth analysis of the influence of context selection on the model performance. Our experimental results reveal multiple key findings. For instance, LCMs present the best performance in the "how to use"scenario while performing the worst in the "when to use"scenario, e.g., the average performance gap of LCMs between "when to use"and "how to use"scenarios achieves 34%, indicating that the "when to use"scenario is more challenging. Furthermore, enriching context information substantially improves the model performance. Specifically, by incorporating the contexts, smaller-sized LCMs can outperform those twenty times larger models without the contexts provided. Based on these findings, we finally provide insights and implications for researchers and developers, which can lay the groundwork for future advancements in the API suggestion task.
AB - API suggestion is a critical task in modern software development, assisting programmers by predicting and recommending third-party APIs based on the current context. Recent advancements in large code models (LCMs) have shown promise in the API suggestion task. However, they mainly focus on suggesting which APIs to use, ignoring that programmers may demand more assistance while using APIs in practice including when to use the suggested APIs and how to use the APIs. To mitigate the gap, we conduct a systematic evaluation of LCMs for the API suggestion task in the paper.To facilitate our investigation, we first build a benchmark that contains a diverse collection of code snippets, covering 176 APIs used in 853 popular Java projects. Three distinct scenarios in the API suggestion task are then considered for evaluation, including (1) "when to use", which aims at determining the desired position and timing for API usage; (2) "which to use", which aims at identifying the appropriate API from a given library; and (3) "how to use", which aims at predicting the arguments for a given API. The consideration of the three scenarios allows for a comprehensive assessment of LCMs' capabilities in suggesting APIs for developers. During the evaluation, we choose nine popular LCMs with varying model sizes for the three scenarios. We also perform an in-depth analysis of the influence of context selection on the model performance. Our experimental results reveal multiple key findings. For instance, LCMs present the best performance in the "how to use"scenario while performing the worst in the "when to use"scenario, e.g., the average performance gap of LCMs between "when to use"and "how to use"scenarios achieves 34%, indicating that the "when to use"scenario is more challenging. Furthermore, enriching context information substantially improves the model performance. Specifically, by incorporating the contexts, smaller-sized LCMs can outperform those twenty times larger models without the contexts provided. Based on these findings, we finally provide insights and implications for researchers and developers, which can lay the groundwork for future advancements in the API suggestion task.
KW - API suggestion
KW - empirical study
KW - large code models
UR - https://www.scopus.com/pages/publications/85212429254
U2 - 10.1145/3691620.3695004
DO - 10.1145/3691620.3695004
M3 - 会议稿件
AN - SCOPUS:85212429254
T3 - Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
SP - 281
EP - 293
BT - Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
PB - Association for Computing Machinery, Inc
Y2 - 28 October 2024 through 1 November 2024
ER -