Skip to main navigation Skip to search Skip to main content

SpanCS:面向跨语言代码生成的片段级语码转换

Translated title of the contribution: SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation
  • Qingfu Zhu
  • , Shiqi Zhou
  • , Shuo Wang
  • , Zhiming Zhang
  • , Haoyu Wang
  • , Qiguang Chen
  • , Wanxiang Che*
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Tsinghua University
  • Beijing University of Posts and Telecommunications

Research output: Contribution to conferencePaperpeer-review

Abstract

Cross-lingual code generation aims to transfer the ability of generating code from English to other natural languages (NLs). Translate-train and Code-switching are two common data augmentation (DA) approaches for cross-lingual transfer, which complement each other but have not been effectively combined. To this end, we propose a span-level code-switching (SpanCS) approach for cross-lingual code generation. First, it leverages the code-switching framework to correlate source language context and target language span to model the interaction and alignment among multiple languages. Second, it utilizes the translate-train approach to extract target language span from a complete source language translation, ensuring the semantic consistency between augmented data and original data. To fairly evaluate the discrepancy of code generation across multiple NLs, we construct MHumanEval, a multilingual code generation benchmark that includes 10 NLs, based on HumanEval via manual translation and verification. Experiments on the benchmark across three backbones show that SpanCS consistently outperforms conventional DA approaches for cross-lingual code generation.

Translated title of the contributionSpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation
Original languageChinese (Traditional)
Pages71-83
Number of pages13
StatePublished - 2024
Event23rd Chinese National Conference on Computational Linguistics, CCL 2024 - Taiyuan, China
Duration: 24 Jul 202428 Jul 2024

Conference

Conference23rd Chinese National Conference on Computational Linguistics, CCL 2024
Country/TerritoryChina
CityTaiyuan
Period24/07/2428/07/24

Fingerprint

Dive into the research topics of 'SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation'. Together they form a unique fingerprint.

Cite this