Skip to main navigation Skip to search Skip to main content

Similarity and diversity induced paired projection for cross-modal retrieval

  • The Chinese University of Hong Kong, Shenzhen
  • University of Science and Technology of China
  • Harbin Institute of Technology Shenzhen
  • University of Macau
  • Chongqing University
  • Shenzhen Institute of Artificial Intelligence and Robotics for Society

Research output: Contribution to journalArticlepeer-review

Abstract

The heterogeneous gap among cross modalities is a critical problem in many applications (e.g., retrieval). Considering that the main purpose of cross-modal learning is to learn a common representation while there also exist specific components across different modalities, a similarity and diversity induced paired projection (SDPP) method is proposed in this paper. SDPP not only extracts the correlation in a common subspace, but also removes the view-specific information which does not contribute to our task. In order to model the specific components, the Hilbert Schmidt Independence Criterion (HSIC) is introduced as a co-regularization to explicitly enforce the diversity. Additionally, different from some existing subspace learning methods which are time consuming in the testing phase, a paired projection strategy is exploited, being capable of obtaining the similar information in a simple but effective way. To optimize the presented approach, an efficient algorithm is designed to update different variables alternatively. Finally, we apply our strategy to the cross-modal retrieval, and experimental results on several real-world datasets substantiate the effectiveness and superiority of our model compared with other state-of-the-art methods.

Original languageEnglish
Pages (from-to)215-228
Number of pages14
JournalInformation Sciences
Volume539
DOIs
StatePublished - Oct 2020
Externally publishedYes

Keywords

  • Cross-modal
  • Diversity
  • Pair projection
  • Similarity

Fingerprint

Dive into the research topics of 'Similarity and diversity induced paired projection for cross-modal retrieval'. Together they form a unique fingerprint.

Cite this