TY - GEN
T1 - QCFE
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
AU - Yan, Yu
AU - Wang, Hongzhi
AU - Huang, Junfang
AU - Zhong, Dake
AU - Yu, Tao
AU - Zhang, Kaixin
AU - Yang, Man
AU - Wang, Tianqing
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Query cost estimation is a classical task for database management. Recently, researchers have applied AI-driven methods to implement query cost estimation for achieving high accuracy. However, two defects of the feature design lead to poor time-accuracy efficiency in the query cost estimation task. On the one hand, existing works only encode the query plan and data statistics while ignoring some important variables, like storage structure, hardware, database knobs, etc. These variables also have a significant impact on the query cost. On the other hand, existing works suffer the heavy model training and model inference due to inefficient features, such as the index encoding of write-only workloads. To address the above two problems, we first propose an efficient feature engineering for query cost estimation, called QCFE, consisting of the feature snapshot and feature reduction algorithm. (1) We design a novel concept called feature snapshot to efficiently integrate the influences of the missing variables. (2) We propose a difference-propagation feature reduction method for query cost estimation to filter the ineffective features. Compared to state-of-the-art methods, QCFE demonstrates significant improvements in various aspects with well-known benchmarks. QCFE saves up to 50% time consumption for model training, resulting in more efficient and faster training processes. QCFE also optimizes the mean q-error by 19.8% in TPCH, leading to more precise query cost estimation. QCFE offers up to an impressive 8 times inference speedup in query inference throughput.
AB - Query cost estimation is a classical task for database management. Recently, researchers have applied AI-driven methods to implement query cost estimation for achieving high accuracy. However, two defects of the feature design lead to poor time-accuracy efficiency in the query cost estimation task. On the one hand, existing works only encode the query plan and data statistics while ignoring some important variables, like storage structure, hardware, database knobs, etc. These variables also have a significant impact on the query cost. On the other hand, existing works suffer the heavy model training and model inference due to inefficient features, such as the index encoding of write-only workloads. To address the above two problems, we first propose an efficient feature engineering for query cost estimation, called QCFE, consisting of the feature snapshot and feature reduction algorithm. (1) We design a novel concept called feature snapshot to efficiently integrate the influences of the missing variables. (2) We propose a difference-propagation feature reduction method for query cost estimation to filter the ineffective features. Compared to state-of-the-art methods, QCFE demonstrates significant improvements in various aspects with well-known benchmarks. QCFE saves up to 50% time consumption for model training, resulting in more efficient and faster training processes. QCFE also optimizes the mean q-error by 19.8% in TPCH, leading to more precise query cost estimation. QCFE offers up to an impressive 8 times inference speedup in query inference throughput.
KW - Cost Estimation
KW - Feature Engineering
KW - Query
UR - https://www.scopus.com/pages/publications/85200519618
U2 - 10.1109/ICDE60146.2024.00328
DO - 10.1109/ICDE60146.2024.00328
M3 - 会议稿件
AN - SCOPUS:85200519618
T3 - Proceedings - International Conference on Data Engineering
SP - 4302
EP - 4315
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
Y2 - 13 May 2024 through 17 May 2024
ER -