Skip to main navigation Skip to search Skip to main content

VMAN: A Virtual Mainstay Alignment Network for Transductive Zero-Shot Learning

  • Guo Sen Xie
  • , Xu Yao Zhang
  • , Yazhou Yao*
  • , Zheng Zhang
  • , Fang Zhao
  • , Ling Shao
  • *Corresponding author for this work
  • Nanjing University of Science and Technology
  • Mohamed Bin Zayed University of Artificial Intelligence
  • CAS - Institute of Automation
  • Harbin Institute of Technology Shenzhen
  • Peng Cheng Laboratory
  • Inception Institute of Artificial Intelligence

Research output: Contribution to journalArticlepeer-review

Abstract

Transductive zero-shot learning (TZSL) extends conventional ZSL by leveraging (unlabeled) unseen images for model training. A typical method for ZSL involves learning embedding weights from the feature space to the semantic space. However, the learned weights in most existing methods are dominated by seen images, and can thus not be adapted to unseen images very well. In this paper, to align the (embedding) weights for better knowledge transfer between seen/unseen classes, we propose the virtual mainstay alignment network (VMAN), which is tailored for the transductive ZSL task. Specifically, VMAN is casted as a tied encoder-decoder net, thus only one linear mapping weights need to be learned. To explicitly learn the weights in VMAN, for the first time in ZSL, we propose to generate virtual mainstay (VM) samples for each seen class, which serve as new training data and can prevent the weights from being shifted to seen images, to some extent. Moreover, a weighted reconstruction scheme is proposed and incorporated into the model training phase, in both the semantic/feature spaces. In this way, the manifold relationships of the VM samples are well preserved. To further align the weights to adapt to more unseen images, a novel instance-category matching regularization is proposed for model re-training. VMAN is thus modeled as a nested minimization problem and is solved by a Taylor approximate optimization paradigm. In comprehensive evaluations on four benchmark datasets, VMAN achieves superior performances under the (Generalized) TZSL setting.

Original languageEnglish
Article number9399836
Pages (from-to)4316-4329
Number of pages14
JournalIEEE Transactions on Image Processing
Volume30
DOIs
StatePublished - 2021
Externally publishedYes

Keywords

  • Zero-shot learning
  • transductive
  • virtual sample generation

Fingerprint

Dive into the research topics of 'VMAN: A Virtual Mainstay Alignment Network for Transductive Zero-Shot Learning'. Together they form a unique fingerprint.

Cite this