Skip to main navigation Skip to search Skip to main content

Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem

  • Yun Peng
  • , Ruida Hu
  • , Ruoke Wang
  • , Cuiyun Gao*
  • , Shuqing Li
  • , Michael R. Lyu
  • *Corresponding author for this work
  • Chinese University of Hong Kong
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Python is the top popular programming language used in the open-source community, largely owing to the extensive support from diverse third-party libraries within the PyPI ecosystem. Nevertheless, the utilization of third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors. Moreover, endeavors have been made to automatically infer dependencies. These approaches focus on version-level checks and inference, based on the assumption that configurations of libraries in the PyPI ecosystem are correct. However, our study reveals that this assumption is not universally valid, and relying solely on version-level checks proves inadequate in ensuring compatible runtime environments. In this paper, we conduct an empirical study to comprehensively study the configuration issues in the PyPI ecosystem. Specifically, we propose PYCONF, a source-level detector, for detecting potential configuration issues. PYCONF employs three distinct checks, targeting the setup, packing, and usage stages of libraries, respectively. To evaluate the effectiveness of the current automatic dependency inference approaches, we build a benchmark called VLIBS, comprising library releases that pass all three checks of PYCONF. We identify 15 kinds of configuration issues and find that 183,864 library releases suffer from potential configuration issues. Remarkably, 68% of these issues can only be detected via the source-level check. Our experiment results show that the most advanced automatic dependency inference approach, PyEGo, can successfully infer dependencies for only 65% of library releases. The primary failures stem from dependency conflicts and the absence of required libraries in the generated configurations. Based on the empirical results, we derive six findings and draw two implications for open-source developers and future research in automatic dependency inference.

Original languageEnglish
Title of host publicationProceedings - 2024 ACM/IEEE 44th International Conference on Software Engineering, ICSE 2024
PublisherIEEE Computer Society
Pages2494-2505
Number of pages12
ISBN (Electronic)9798400702174
DOIs
StatePublished - 20 May 2024
Externally publishedYes
Event44th ACM/IEEE International Conference on Software Engineering, ICSE 2024 - Lisbon, Portugal
Duration: 14 Apr 202420 Apr 2024

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

Conference44th ACM/IEEE International Conference on Software Engineering, ICSE 2024
Country/TerritoryPortugal
CityLisbon
Period14/04/2420/04/24

Keywords

  • Configuration
  • Empirical Study
  • PyPI
  • Python

Fingerprint

Dive into the research topics of 'Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem'. Together they form a unique fingerprint.

Cite this