TY - GEN
T1 - Neural Image Compression with Multi-Scale Depthwise Separable Dilated Convolution and Multi-Distribution Mixture Entropy Model
AU - Yang, Dongjian
AU - Fan, Xiaopeng
AU - Meng, Xiandong
AU - Zhao, Debin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Recently, neural image compression (NIC) has made remarkable progress. Two key parts of NIC are the encoder-decoder and the entropy model. For the encoder-decoder, a larger effective receptive field (ERF) means a stronger transformation ability. Existing methods usually enlarge the ERF at the expense of complexity, which is intolerable. To address this issue, we propose a multi-scale depthwise separable dilated convolution (MSDSDC) to build the encoder-decoder. Specifically, we first construct a depthwise separable dilated convolution (DSDC) by using the depthwise separable strategy in dilated convolution to reduce its complexity. Subsequently, multi-scale features extracted by three DSDCs with varying dilation rates are fused to expand the ERF of the encoder-decoder, consequently enhancing its transformation capability. Besides, we design a multi-distribution mixture entropy model (MDMEM) to further enhance the flexibility of latent representation probability modeling. The experimental results demonstrate that our proposed method achieves the best balance between rate-distortion performance and complexity.
AB - Recently, neural image compression (NIC) has made remarkable progress. Two key parts of NIC are the encoder-decoder and the entropy model. For the encoder-decoder, a larger effective receptive field (ERF) means a stronger transformation ability. Existing methods usually enlarge the ERF at the expense of complexity, which is intolerable. To address this issue, we propose a multi-scale depthwise separable dilated convolution (MSDSDC) to build the encoder-decoder. Specifically, we first construct a depthwise separable dilated convolution (DSDC) by using the depthwise separable strategy in dilated convolution to reduce its complexity. Subsequently, multi-scale features extracted by three DSDCs with varying dilation rates are fused to expand the ERF of the encoder-decoder, consequently enhancing its transformation capability. Besides, we design a multi-distribution mixture entropy model (MDMEM) to further enhance the flexibility of latent representation probability modeling. The experimental results demonstrate that our proposed method achieves the best balance between rate-distortion performance and complexity.
KW - depthwise separable convolution
KW - entropy model
KW - multi-scale dilated convolution
KW - neural image compression
UR - https://www.scopus.com/pages/publications/105006829621
U2 - 10.1109/DCC62719.2025.00098
DO - 10.1109/DCC62719.2025.00098
M3 - 会议稿件
AN - SCOPUS:105006829621
T3 - Data Compression Conference Proceedings
SP - 411
BT - Proceedings - DCC 2025
A2 - Bilgin, Ali
A2 - Fowler, James E.
A2 - Serra-Sagrista, Joan
A2 - Ye, Yan
A2 - Storer, James A.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 Data Compression Conference, DCC 2025
Y2 - 18 March 2025 through 21 March 2025
ER -