Abstract
With the rapid increase of multi-modal data through the internet, cross-modal matching or retrieval has received much attention recently. It aims to use one type of data as query and retrieve results from the database of another type. For this task, the most popular approach is the latent subspace learning, which learns a shared subspace for multi-modal data, so that we can efficiently measure cross-modal similarity. Instead of adopting traditional regularization terms, we hope that the latent representation could recover the multi-modal information, which works as a reconstruction regularization term. Besides, we assume that different view features for samples of the same category share the same representation in the latent space. Since the number of classes is generally smaller than the number of samples and the feature dimension, therefore the latent feature matrix of training instances should be low-rank. We try to learn the optimal latent representation, and propose a reconstruction based term to recover original multi-modal data and a low-rank term to regularize the learning of subspace. Our method can deal with both supervised and unsupervised cross-modal retrieval tasks. For those situations where the semantic labels are not easy to obtain, our proposed method can also work very well. We propose an efficient algorithm to optimize our framework. To evaluate the performance of our method, we conduct extensive experiments on various datasets. The experimental results show that our proposed method is very efficient and outperforms the state-of-the-art subspace learning approaches.
| Original language | English |
|---|---|
| Article number | 107813 |
| Journal | Pattern Recognition |
| Volume | 113 |
| DOIs | |
| State | Published - May 2021 |
| Externally published | Yes |
Keywords
- Cross-modal retrieval
- Low-rank subspace learning
- Reconstruction regularization
Fingerprint
Dive into the research topics of 'Reconstruction regularized low-rank subspace learning for cross-modal retrieval'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver