Skip to main navigation Skip to search Skip to main content

CRED-SQL: Enhancing Real-World Large Scale Database Text-to-SQL Parsing Through Cluster Retrieval and Execution Description

  • Shaoming Duan
  • , Zirui Wang
  • , Chuanyi Liu*
  • , Zhibin Zhu
  • , Yuhao Zhang
  • , Peiyi Han
  • , Liang Yan
  • , Zewu Peng
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Pengcheng Laboratory
  • Mindflow.ai
  • Inspur Cloud Information Technology Co., Ltd.
  • China Southern Power Grid

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent advances in large language models (LLMs) have significantly improved the accuracy of Text-to-SQL systems. However, a critical challenge remains: the semantic mismatch between natural language questions (NLQs) and their corresponding SQL queries. This issue is exacerbated in large-scale databases, where semantically similar attributes hinder schema linking and semantic drift during SQL generation, ultimately reducing model accuracy. To address these challenges, we introduce CRED-SQL, a framework designed for large-scale databases that integrates Cluster Retrieval and Execution Description. CRED-SQL first performs cluster-based large-scale schema retrieval to pinpoint the tables and columns most relevant to a given NLQ, alleviating schema mismatch. It then introduces an intermediate natural language representation - Execution Description Language (EDL) - to bridge the gap between NLQs and SQL. This reformulation decomposes the task into two stages: Text-to-EDL and EDL-to-SQL, leveraging LLMs' strong general reasoning capabilities while reducing semantic deviation. Extensive experiments on two large-scale, cross-domain benchmarks - SpiderUnion and BirdUnion - demonstrate that CRED-SQL achieves new state-of-the-art (SOTA) performance, validating its effectiveness and scalability. Our code is available at https://github.com/smduan/CRED-SQL.git.

Original languageEnglish
Title of host publicationECAI 2025 - 28th European Conference on Artificial Intelligence, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025 - Proceedings
EditorsInes Lynce, Nello Murano, Mauro Vallati, Serena Villata, Federico Chesani, Michela Milano, Andrea Omicini, Mehdi Dastani
PublisherIOS Press BV
Pages4394-4401
Number of pages8
ISBN (Electronic)9781643686318
DOIs
StatePublished - 21 Oct 2025
Externally publishedYes
Event28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025 - Bologna, Italy
Duration: 25 Oct 202530 Oct 2025

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume413
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025
Country/TerritoryItaly
CityBologna
Period25/10/2530/10/25

Fingerprint

Dive into the research topics of 'CRED-SQL: Enhancing Real-World Large Scale Database Text-to-SQL Parsing Through Cluster Retrieval and Execution Description'. Together they form a unique fingerprint.

Cite this