TY - GEN
T1 - Efficient cross-modal retrieval using social tag information towards mobile applications
AU - He, Jianfeng
AU - Wang, Shuhui
AU - Qu, Qiang
AU - Zhang, Weigang
AU - Huang, Qingming
N1 - Publisher Copyright:
© Springer International Publishing AG 2018.
PY - 2018
Y1 - 2018
N2 - With the prevalence of mobile devices, millions of multimedia data represented as a combination of visual, aural and textual modalities, is produced every second. To facilitate better information retrieval on mobile devices, it becomes imperative to develop efficient models to retrieve heterogeneous content modalities using a specific query input, e.g., text-to-image or image-to-text retrieval. Unfortunately, previous works address the problem without considering the hardware constraints of the mobile devices. In this paper, we propose a novel method named Trigonal Partial Least Squares (TPLS) for the task of cross-modal retrieval on mobile devices. Specifically, TPLS works under the hardware constrains of mobile devices, i.e., limited memory size and no GPU acceleration. To take advantage of users’ tags for model training, we take the label information provided by the users as the third modality. Then, any two modalities of texts, images and labels are used to build a Kernel PLS model. As a result, TPLS is a joint model of three Kernel PLS models, and a constraint to narrow the distance between label spaces of images and texts is proposed. To efficiently learn the model, we use stochastic parallel gradient descent (SGD) to accelerate the learning speed with reduced memory consumption. To show the effectiveness of TPLS, the experiments are conducted on popular cross-modal retrieval benchmark datasets, and competitive results have been obtained.
AB - With the prevalence of mobile devices, millions of multimedia data represented as a combination of visual, aural and textual modalities, is produced every second. To facilitate better information retrieval on mobile devices, it becomes imperative to develop efficient models to retrieve heterogeneous content modalities using a specific query input, e.g., text-to-image or image-to-text retrieval. Unfortunately, previous works address the problem without considering the hardware constraints of the mobile devices. In this paper, we propose a novel method named Trigonal Partial Least Squares (TPLS) for the task of cross-modal retrieval on mobile devices. Specifically, TPLS works under the hardware constrains of mobile devices, i.e., limited memory size and no GPU acceleration. To take advantage of users’ tags for model training, we take the label information provided by the users as the third modality. Then, any two modalities of texts, images and labels are used to build a Kernel PLS model. As a result, TPLS is a joint model of three Kernel PLS models, and a constraint to narrow the distance between label spaces of images and texts is proposed. To efficiently learn the model, we use stochastic parallel gradient descent (SGD) to accelerate the learning speed with reduced memory consumption. To show the effectiveness of TPLS, the experiments are conducted on popular cross-modal retrieval benchmark datasets, and competitive results have been obtained.
KW - Cross-modal retrieval
KW - Images and documents
KW - Multimedia
KW - Partial least squares
UR - https://www.scopus.com/pages/publications/85041096343
U2 - 10.1007/978-3-319-73521-4_10
DO - 10.1007/978-3-319-73521-4_10
M3 - 会议稿件
AN - SCOPUS:85041096343
SN - 9783319735207
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 157
EP - 176
BT - Mobility Analytics for Spatio-Temporal and Social Data - 1st International Workshop, MATES 2017, Revised Selected Papers
A2 - Wang, Shuhui
A2 - Doulkeridis, Christos
A2 - Vouros, George A.
A2 - Qu, Qiang
PB - Springer Verlag
T2 - 1st International Workshop on Mobility Analytics for Spatiotemporal and Social Data, MATES 2017
Y2 - 1 September 2017 through 1 September 2017
ER -