HIPA: Hierarchical Patch Transformer for Single Image Super Resolution

  • Qing Cai
  • , Yiming Qian
  • , Jinxing Li
  • , Jun Lyu*
  • , Yee Hong Yang
  • , Feng Wu
  • , David Zhang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Transformer-based architectures start to emerge in single image super resolution (SISR) and have achieved promising performance. However, most existing vision Transformer-based SISR methods still have two shortcomings: (1) they divide images into the same number of patches with a fixed size, which may not be optimal for restoring patches with different levels of texture richness; and (2) their position encodings treat all input tokens equally and hence, neglect the dependencies among them. This paper presents a HIPA, which stands for a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition. Specifically, we build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge them to form the full resolution. Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions, e.g., using a smaller patch for areas with fine details and a larger patch for textureless regions. Meanwhile, a new attention-based position encoding scheme for Transformer is proposed to let the network focus on which tokens should be paid more attention by assigning different weights to different tokens, which is the first time to our best knowledge. Furthermore, we also propose a multi-receptive field attention module to enlarge the convolution receptive field from different branches. The experimental results on several public datasets demonstrate the superior performance of the proposed HIPA over previous methods quantitatively and qualitatively. We will share our code and models when the paper is accepted.

Original languageEnglish
Pages (from-to)3226-3237
Number of pages12
JournalIEEE Transactions on Image Processing
Volume32
DOIs
StatePublished - 2023
Externally publishedYes

Keywords

  • Image restoration
  • attention-based position embedding
  • hierarchical patch transformer
  • single image super-resolution

Fingerprint

Dive into the research topics of 'HIPA: Hierarchical Patch Transformer for Single Image Super Resolution'. Together they form a unique fingerprint.

Cite this