Skip to main navigation Skip to search Skip to main content

Efficient cross-modal retrieval using social tag information towards mobile applications

  • Jianfeng He
  • , Shuhui Wang*
  • , Qiang Qu
  • , Weigang Zhang
  • , Qingming Huang
  • *Corresponding author for this work
  • CAS - Institute of Computing Technology
  • Shenzhen Institute of Advanced Technology
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the prevalence of mobile devices, millions of multimedia data represented as a combination of visual, aural and textual modalities, is produced every second. To facilitate better information retrieval on mobile devices, it becomes imperative to develop efficient models to retrieve heterogeneous content modalities using a specific query input, e.g., text-to-image or image-to-text retrieval. Unfortunately, previous works address the problem without considering the hardware constraints of the mobile devices. In this paper, we propose a novel method named Trigonal Partial Least Squares (TPLS) for the task of cross-modal retrieval on mobile devices. Specifically, TPLS works under the hardware constrains of mobile devices, i.e., limited memory size and no GPU acceleration. To take advantage of users’ tags for model training, we take the label information provided by the users as the third modality. Then, any two modalities of texts, images and labels are used to build a Kernel PLS model. As a result, TPLS is a joint model of three Kernel PLS models, and a constraint to narrow the distance between label spaces of images and texts is proposed. To efficiently learn the model, we use stochastic parallel gradient descent (SGD) to accelerate the learning speed with reduced memory consumption. To show the effectiveness of TPLS, the experiments are conducted on popular cross-modal retrieval benchmark datasets, and competitive results have been obtained.

Original languageEnglish
Title of host publicationMobility Analytics for Spatio-Temporal and Social Data - 1st International Workshop, MATES 2017, Revised Selected Papers
EditorsShuhui Wang, Christos Doulkeridis, George A. Vouros, Qiang Qu
PublisherSpringer Verlag
Pages157-176
Number of pages20
ISBN (Print)9783319735207
DOIs
StatePublished - 2018
Externally publishedYes
Event1st International Workshop on Mobility Analytics for Spatiotemporal and Social Data, MATES 2017 - Munich, Germany
Duration: 1 Sep 20171 Sep 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10731 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Workshop on Mobility Analytics for Spatiotemporal and Social Data, MATES 2017
Country/TerritoryGermany
CityMunich
Period1/09/171/09/17

Keywords

  • Cross-modal retrieval
  • Images and documents
  • Multimedia
  • Partial least squares

Fingerprint

Dive into the research topics of 'Efficient cross-modal retrieval using social tag information towards mobile applications'. Together they form a unique fingerprint.

Cite this