An efficient multi-path structure with staged connection and multi-scale mechanism for text-to-image synthesis

  • Jiajun Ding
  • , Beili Liu
  • , Jun Yu
  • , Huanlei Guo*
  • , Ming Shen
  • , Kenong Shen
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Generating a realistic image which matches the given text description is a challenging task. The multi-stage framework obtains the high-resolution image by constructing a low-resolution image firstly, which is widely adopted for text-to-image synthesis task. However, subsequent stages of existing generator have to construct the whole image repeatedly, while the primitive features of the objects have been sketched out in the previously adjacent stage. In order to make the subsequent stages focus on enriching fine-grained details and improve the quality of the final generated image, an efficient multi-path structure is proposed for multi-stage framework in this paper. The proposed structure contains two parts: staged connection and multi-scale module. Staged connection is employed to transfer the feature maps of the generated image from previously adjacent stage to the end of current stage. Such path can avoid the requirement of long-term memory and guide the network focus on modifying and supplementing the details of generated image. In addition, the multi-scale module is explored to extract feature at different scales and generate image with more fine-grained details. The proposed multi-path structure can be introduced to multi-stage based algorithm such as StackGAN-v2 and AttnGAN. Extensive experiments are conducted on two widely used datasets, i.e. Oxford-102 and CUB dataset, for the text-to-image synthesis task. The results demonstrate the superior performance of the methods with multi-path structure over the base models.

Original languageEnglish
Pages (from-to)1391-1403
Number of pages13
JournalMultimedia Systems
Volume29
Issue number3
DOIs
StatePublished - Jun 2023
Externally publishedYes

Keywords

  • Multi-scale mechanism
  • Multi-stage framework
  • Staged connection
  • Text-to-image synthesis

Fingerprint

Dive into the research topics of 'An efficient multi-path structure with staged connection and multi-scale mechanism for text-to-image synthesis'. Together they form a unique fingerprint.

Cite this