Skip to main navigation Skip to search Skip to main content

SkeletonNet: A Hybrid Network with a Skeleton-Embedding Process for Multi-View Image Representation Learning

  • Shijie Yang
  • , Liang Li*
  • , Shuhui Wang
  • , Weigang Zhang
  • , Qingming Huang
  • , Qi Tian
  • *Corresponding author for this work
  • University of Chinese Academy of Sciences
  • CAS - Institute of Computing Technology
  • School of Computer Science and Technology, Harbin Institute of Technology
  • University of Texas at San Antonio
  • Huawei Technologies Co., Ltd.

Research output: Contribution to journalArticlepeer-review

Abstract

Multi-view representation learning plays a fundamental role in multimedia data analysis. Some specific inter-view alignment principles are adopted in conventional models, where there is an assumption that different views share a common latent subspace. However, when dealing views on diverse semantic levels, the view-specific characteristics are neglected, and the divergent inconsistency of similarity measurements hinders sufficient information sharing. This paper proposes a hybrid deep network by introducing tensor factorization into the multi-view deep auto-encoder. The network adopts skeleton-embedding process for unsupervised multi-view subspace learning. It takes full consideration of view-specific characteristics, and leverages the strength of both shallow and deep architectures for modeling low- and high-level views, respectively. We first formulate the high-level-view semantic distribution as the underlying skeleton structure of the learned subspace, and then infer the local tangent structures according to the affinity propagation of low-level-view geometric correlations. As a consequence, more discriminative subspace representation can be learned from global semantic pivots to local geometric details. Experimental comparisons on three benchmark image datasets show the promising performance and flexibility of our model.

Original languageEnglish
Article number8695120
Pages (from-to)2916-2929
Number of pages14
JournalIEEE Transactions on Multimedia
Volume21
Issue number11
DOIs
StatePublished - Nov 2019
Externally publishedYes

Keywords

  • Unsupervised multi-view subspace learning
  • deep auto-encoders
  • semantic inconsistency
  • tensor factorization

Fingerprint

Dive into the research topics of 'SkeletonNet: A Hybrid Network with a Skeleton-Embedding Process for Multi-View Image Representation Learning'. Together they form a unique fingerprint.

Cite this