Abstract
Cross-lingual code generation aims to transfer the ability of generating code from English to other natural languages (NLs). Translate-train and Code-switching are two common data augmentation (DA) approaches for cross-lingual transfer, which complement each other but have not been effectively combined. To this end, we propose a span-level code-switching (SpanCS) approach for cross-lingual code generation. First, it leverages the code-switching framework to correlate source language context and target language span to model the interaction and alignment among multiple languages. Second, it utilizes the translate-train approach to extract target language span from a complete source language translation, ensuring the semantic consistency between augmented data and original data. To fairly evaluate the discrepancy of code generation across multiple NLs, we construct MHumanEval, a multilingual code generation benchmark that includes 10 NLs, based on HumanEval via manual translation and verification. Experiments on the benchmark across three backbones show that SpanCS consistently outperforms conventional DA approaches for cross-lingual code generation.
| Translated title of the contribution | SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation |
|---|---|
| Original language | Chinese (Traditional) |
| Pages | 71-83 |
| Number of pages | 13 |
| State | Published - 2024 |
| Event | 23rd Chinese National Conference on Computational Linguistics, CCL 2024 - Taiyuan, China Duration: 24 Jul 2024 → 28 Jul 2024 |
Conference
| Conference | 23rd Chinese National Conference on Computational Linguistics, CCL 2024 |
|---|---|
| Country/Territory | China |
| City | Taiyuan |
| Period | 24/07/24 → 28/07/24 |
Fingerprint
Dive into the research topics of 'SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver