TY - GEN
T1 - Apic
T2 - 2025 Data Compression Conference, DCC 2025
AU - Chen, Yufan
AU - Zou, Xiangyu
AU - Deng, Kaiwen
AU - Hu, Hao
AU - Deng, Cai
AU - Feng, Ke
AU - Xia, Wen
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Current compressors for OLTP databases perform well on text but face challenges with integers, although integers are a critical component of the workload. Most existing integer compressors are ineffective as a complementary solution, since they compress integers together and cannot decompress a certain integer individually, making them incompatible with the data access requirement of OLTP databases. To this end, we propose Apic, a precomputation-based arithmetic coding to efficiently compress each integers (a very tiny unit), though small data are always hard to compress, and ensure compatibility with OLTP datasets. Specifically, Apic presents Bitwidth-aware Precomputed Frequency and Prefixaware Precomputed Decoding to tackle challenges of applying arithmetic coding in this scenario, such as the substantial space costs of symbol frequencies and decompression complexity. Evaluations on real-world and desensitized commercial datasets suggest that Apic improves the compression ratio by up to 80% on integers over VByte, while preserving comparable decompression speed and thus query performance.
AB - Current compressors for OLTP databases perform well on text but face challenges with integers, although integers are a critical component of the workload. Most existing integer compressors are ineffective as a complementary solution, since they compress integers together and cannot decompress a certain integer individually, making them incompatible with the data access requirement of OLTP databases. To this end, we propose Apic, a precomputation-based arithmetic coding to efficiently compress each integers (a very tiny unit), though small data are always hard to compress, and ensure compatibility with OLTP datasets. Specifically, Apic presents Bitwidth-aware Precomputed Frequency and Prefixaware Precomputed Decoding to tackle challenges of applying arithmetic coding in this scenario, such as the substantial space costs of symbol frequencies and decompression complexity. Evaluations on real-world and desensitized commercial datasets suggest that Apic improves the compression ratio by up to 80% on integers over VByte, while preserving comparable decompression speed and thus query performance.
UR - https://www.scopus.com/pages/publications/105006810167
U2 - 10.1109/DCC62719.2025.00038
DO - 10.1109/DCC62719.2025.00038
M3 - 会议稿件
AN - SCOPUS:105006810167
T3 - Data Compression Conference Proceedings
SP - 303
EP - 312
BT - Proceedings - DCC 2025
A2 - Bilgin, Ali
A2 - Fowler, James E.
A2 - Serra-Sagrista, Joan
A2 - Ye, Yan
A2 - Storer, James A.
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 March 2025 through 21 March 2025
ER -