Skip to main navigation Skip to search Skip to main content

Collaboratively Semantic Alignment and Metric Learning for Cross-Modal Hashing

  • Jiaxing Li
  • , Wai Keung Wong*
  • , Lin Jiang
  • , Kaihang Jiang
  • , Xiaozhao Fang*
  • , Shengli Xie
  • , Jie Wen*
  • *Corresponding author for this work
  • Guangzhou University
  • Hong Kong Polytechnic University
  • Laboratory for Artificial Intelligence in Design
  • Guangdong Polytechnic Normal University
  • Guangdong University of Technology
  • Ministry of Education of the People's Republic of China
  • Harbin Institute of Technology Shenzhen
  • Shenzhen Key Laboratory of Visual Object Detection and Recognition

Research output: Contribution to journalArticlepeer-review

Abstract

Cross-modal retrieval is a promising technique nowadays to find semantically similar instances in other modalities while a query instance is given from one modality. However, there still exists many challenges for reducing heterogeneous modality gap by embedding label information to discrete hash codes effectively, solving the binary optimization when generating unified hash codes and reducing the discrepancy of data distribution efficiently during common space learning. In order to overcome the above-mentioned challenges, we propose a Collaboratively Semantic alignment and Metric learning for cross-modal Hashing (CSMH) in this paper. Specifically, by a kernelization operation, CSMH first extracts the non-linear data features for each modality, which are projected into a latent subspace to align both marginal and conditional distributions simultaneously. Then, a maximum mean discrepancy-based metric strategy is customized to mitigate the distribution discrepancies among features from different modalities. Finally, semantic information obtained from the label similarity matrix, is further incorporated to embed the latent semantic structure into the discriminant subspace. Experimental results of CSMH and baseline methods on four widely-used datasets show that CSMH outperforms some state-of-the-art hashing baseline methods for cross-modal retrieval on efficiency and precision.

Original languageEnglish
Pages (from-to)2311-2328
Number of pages18
JournalIEEE Transactions on Knowledge and Data Engineering
Volume37
Issue number5
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Cross-modal hashing
  • information retrieval
  • maximum mean discrepancy
  • metric learning
  • semantic alignment

Fingerprint

Dive into the research topics of 'Collaboratively Semantic Alignment and Metric Learning for Cross-Modal Hashing'. Together they form a unique fingerprint.

Cite this