Disentangling Multi-view Representations Beyond Inductive Bias

  • Guanzhou Ke
  • , Yang Yu*
  • , Guoqing Chao
  • , Xiaoli Wang
  • , Chenyang Xu
  • , Shengfeng He*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-view (or -modality) representation learning aims to understand the relationships between different view representations. Existing methods disentangle multi-view representations into consistent and view-specific representations by introducing strong inductive biases, which can limit their generalization ability. In this paper, we propose a novel multi-view representation disentangling method that aims to go beyond inductive biases, ensuring both interpretability and generalizability of the resulting representations. Our method is based on the observation that discovering multi-view consistency in advance can determine the disentangling information boundary, leading to a decoupled learning objective. We also found that the consistency can be easily extracted by maximizing the transformation invariance and clustering consistency between views. These observations drive us to propose a two-stage framework. In the first stage, we obtain multi-view consistency by training a consistent encoder to produce semantically-consistent representations across views as well as their corresponding pseudo-labels. In the second stage, we disentangle specificity from comprehensive representations by minimizing the upper bound of mutual information between consistent and comprehensive representations. Finally, we reconstruct the original data by concatenating pseudo-labels and view-specific representations. Our experiments on four multi-view datasets demonstrate that our proposed method outperforms 12 comparison methods in terms of clustering and classification performance. The visualization results also show that the extracted consistency and specificity are compact and interpretable. Our code can be found at https://github.com/Guanzhou-Ke/DMRIB.

Original languageEnglish
Title of host publicationMM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages2582-2590
Number of pages9
ISBN (Electronic)9798400701085
DOIs
StatePublished - 27 Oct 2023
Externally publishedYes
Event31st ACM International Conference on Multimedia, MM 2023 - Ottawa, Canada
Duration: 29 Oct 20233 Nov 2023

Publication series

NameMM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

Conference

Conference31st ACM International Conference on Multimedia, MM 2023
Country/TerritoryCanada
CityOttawa
Period29/10/233/11/23

Keywords

  • consistency and specificity
  • disentangled representation
  • multi-view representation learning

Fingerprint

Dive into the research topics of 'Disentangling Multi-view Representations Beyond Inductive Bias'. Together they form a unique fingerprint.

Cite this