Skip to main navigation Skip to search Skip to main content

Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN

  • Harbin Institute of Technology Shenzhen
  • Pengcheng Laboratory

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, learned image compression (LIC) has shown significant research potential. Most existing LIC methods are CNN-based or transformer-based or mixed. However, these LIC methods suffer from a certain degree of degradation in global attention performance, as CNN has limited-sized convolution kernels while window partitioning is applied to reduce computational complexity in transformer. This gives rise to the following two issues: (1) The main autoencoder (AE) and hyper AE exhibit limited transformation capabilities due to insufficient global modeling, making it challenging to improve the accuracy of coarse-grained entropy model. (2) The fine-grained entropy model struggles to adaptively utilize a larger range of contexts, because of weaker global modeling capability. In this paper, we propose the LIC with joint enhanced swin transformer (SwinT) and CNN to improve the entropy modeling accuracy. The key in the proposed method is that we enhance the global modeling ability of SwinT by introducing neighborhood window attention while maintaining an acceptable computational complexity and combines the local modeling ability of CNN to form the enhanced SwinT and CNN block (ESTCB). Specifically, we reconstruct the main AE and hyper AE of LIC based on ESTCB, enhancing their global transformation capabilities and resulting in a more accurate coarse-grained entropy model. Besides, we combine ESTCB with the checkerboard mask and the channel autoregressive model to develop a spatial then channel fine-grained entropy model, expanding the scope of LIC adaptive reference contexts. Comprehensive experiments demonstrate that our proposed method achieves state-of-the-art rate-distortion performance compared to existing LIC models.

Original languageEnglish
Article number202
JournalMultimedia Systems
Volume30
Issue number4
DOIs
StatePublished - Aug 2024

Keywords

  • Convolutional neural network
  • Entropy model
  • Learned image compression
  • Swin transformer

Fingerprint

Dive into the research topics of 'Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN'. Together they form a unique fingerprint.

Cite this