Skip to main navigation Skip to search Skip to main content

Graph Convolutional Multi-modal Hashing for Flexible Multimedia Retrieval

  • Xu Lu
  • , Lei Zhu*
  • , Li Liu
  • , Liqiang Nie
  • , Huaxiang Zhang
  • *Corresponding author for this work
  • Shandong Normal University
  • Shandong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-modal hashing makes an important contribution to multimedia retrieval, where a key challenge is to encode heterogeneous modalities into compact hash codes. To solve this dilemma, graph-based multi-modal hashing methods generally define individual affinity matrix of each independent modality and apply linear algorithm for heterogeneous modalities fusion and compact hash learning. Several other methods construct graph Laplacian matrix based on semantic information to help learn discriminative hash code. However, these conventional methods roughly ignore the structural similarity of training set and the complex relations among multi-modal samples, which leads to unsatisfactory complementarity of fused hash codes. More notably, they are faced with two other important problems: huge computing and storage costs caused by graph construction and partial modality feature lost problem when incomplete query sample comes. In this paper, we propose a Flexible Graph Convolutional Multi-modal Hashing (FGCMH) method that adopts GCNs with linear complexity to preserve both the modality-individual and modality-fused structural similarity for discriminative hash learning. Necessarily, accurate multimedia retrieval can be performed on complete and incomplete datasets with our method. Specifically, multiple modality-individual GCNs under semantic guidance are proposed to act on each individual modality independently for intra-modality similarity preserving, then the output representations are fused into a fusion graph with adaptive weighting scheme. Hash GCN and semantic GCN, which share parameters in the first two layers, propagate fusion information and generate hash codes under high-level label space supervision. In the query stage, our method adaptively captures various multi-modal contents in a flexible and robust way, even if partial modality features are lost. Experimental results on three publicly datasets show the flexibility and effectiveness of our proposed method.

Original languageEnglish
Title of host publicationMM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages1414-1422
Number of pages9
ISBN (Electronic)9781450386517
DOIs
StatePublished - 17 Oct 2021
Externally publishedYes
Event29th ACM International Conference on Multimedia, MM 2021 - Virtual, Online, China
Duration: 20 Oct 202124 Oct 2021

Publication series

NameMM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

Conference

Conference29th ACM International Conference on Multimedia, MM 2021
Country/TerritoryChina
CityVirtual, Online
Period20/10/2124/10/21

Keywords

  • graph convolutional network
  • hashing
  • multi-modal
  • multimedia retrieval

Fingerprint

Dive into the research topics of 'Graph Convolutional Multi-modal Hashing for Flexible Multimedia Retrieval'. Together they form a unique fingerprint.

Cite this