Abstract
Cross-modal retrieval is a promising technique nowadays to find semantically similar instances in other modalities while a query instance is given from one modality. However, there still exists many challenges for reducing heterogeneous modality gap by embedding label information to discrete hash codes effectively, solving the binary optimization when generating unified hash codes and reducing the discrepancy of data distribution efficiently during common space learning. In order to overcome the above-mentioned challenges, we propose a Collaboratively Semantic alignment and Metric learning for cross-modal Hashing (CSMH) in this paper. Specifically, by a kernelization operation, CSMH first extracts the non-linear data features for each modality, which are projected into a latent subspace to align both marginal and conditional distributions simultaneously. Then, a maximum mean discrepancy-based metric strategy is customized to mitigate the distribution discrepancies among features from different modalities. Finally, semantic information obtained from the label similarity matrix, is further incorporated to embed the latent semantic structure into the discriminant subspace. Experimental results of CSMH and baseline methods on four widely-used datasets show that CSMH outperforms some state-of-the-art hashing baseline methods for cross-modal retrieval on efficiency and precision.
| Original language | English |
|---|---|
| Pages (from-to) | 2311-2328 |
| Number of pages | 18 |
| Journal | IEEE Transactions on Knowledge and Data Engineering |
| Volume | 37 |
| Issue number | 5 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
Keywords
- Cross-modal hashing
- information retrieval
- maximum mean discrepancy
- metric learning
- semantic alignment
Fingerprint
Dive into the research topics of 'Collaboratively Semantic Alignment and Metric Learning for Cross-Modal Hashing'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver