TY - GEN
T1 - Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL
AU - Wang, Dingzirui
AU - Dou, Longxu
AU - Zhang, Xuanliang
AU - Zhu, Qingfu
AU - Che, Wanxiang
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - In-context learning with large language models (LLMs) is the current mainstream method for text-to-SQL. Previous studies have explored selecting relevant demonstrations from a human-labeled demonstration pool, but these methods lack diversity and incur high labeling costs. In this work, we address measuring and enhancing the diversity of the text-to-SQL demonstration pool. First, we introduce a diversity metric and present that the diversity of the existing labeling data can be further enhanced. Motivated by these findings, we propose FUSED that iteratively fuses demonstrations to create a diverse demonstration pool based on human labeling or even from scratch with LLMs, reducing labeling costs. FUSED achieves an average improvement of 2.1% based on existing labeling and 5.5% from scratch on several mainstream datasets, demonstrating its effectiveness.
AB - In-context learning with large language models (LLMs) is the current mainstream method for text-to-SQL. Previous studies have explored selecting relevant demonstrations from a human-labeled demonstration pool, but these methods lack diversity and incur high labeling costs. In this work, we address measuring and enhancing the diversity of the text-to-SQL demonstration pool. First, we introduce a diversity metric and present that the diversity of the existing labeling data can be further enhanced. Motivated by these findings, we propose FUSED that iteratively fuses demonstrations to create a diverse demonstration pool based on human labeling or even from scratch with LLMs, reducing labeling costs. FUSED achieves an average improvement of 2.1% based on existing labeling and 5.5% from scratch on several mainstream datasets, demonstrating its effectiveness.
UR - https://www.scopus.com/pages/publications/85217620954
U2 - 10.18653/v1/2024.findings-emnlp.65
DO - 10.18653/v1/2024.findings-emnlp.65
M3 - 会议稿件
AN - SCOPUS:85217620954
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
SP - 1193
EP - 1207
BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
PB - Association for Computational Linguistics (ACL)
T2 - 2024 Findings of the Association for Computational Linguistics, EMNLP 2024
Y2 - 12 November 2024 through 16 November 2024
ER -