TY - GEN
T1 - Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval
AU - Wu, Hongfa
AU - Zhang, Lisai
AU - Chen, Qingcai
AU - Deng, Yimeng
AU - Siebert, Joanna
AU - Han, Yunpeng
AU - Li, Zhonghua
AU - Kong, Dejiang
AU - Cao, Zhao
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/17
Y1 - 2022/10/17
N2 - Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for its low storage cost and fast indexing speed. Thanks to the success of deep learning, cross-modal hashing has made significant progress with high-quality deep features. However, the modal gap is still a crucial bottleneck for existing cross-modal hashing methods: the commonly used convolutional neural network and bag-of-words encoders are customized for single modal prior, limiting the models to learn semantics representation in a cross-modal space. To overcome modality heterogeneity, we propose a shared transformer encoder (UniHash) to unify the cross-modal hashing into the same semantic space. A contrastive label correlation learning (CLC) loss using the category labels as modality bridge is designed together to improve the representation quality. Moreover, we take advantage of the multi-hot label space and propose a negative label generation (NegLG) strategy to get richer and uniformly distributed negative labels for contrast. Extensive experiments on three benchmarks verify the advantage of our proposed method. Besides, the proposed UniHash outperforms state-of-the-art cross-modal hashing methods significantly, establishing a new important baseline for the cross-modal hashing research. Codes are released github.com/idealwhite/Unihash.
AB - Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for its low storage cost and fast indexing speed. Thanks to the success of deep learning, cross-modal hashing has made significant progress with high-quality deep features. However, the modal gap is still a crucial bottleneck for existing cross-modal hashing methods: the commonly used convolutional neural network and bag-of-words encoders are customized for single modal prior, limiting the models to learn semantics representation in a cross-modal space. To overcome modality heterogeneity, we propose a shared transformer encoder (UniHash) to unify the cross-modal hashing into the same semantic space. A contrastive label correlation learning (CLC) loss using the category labels as modality bridge is designed together to improve the representation quality. Moreover, we take advantage of the multi-hot label space and propose a negative label generation (NegLG) strategy to get richer and uniformly distributed negative labels for contrast. Extensive experiments on three benchmarks verify the advantage of our proposed method. Besides, the proposed UniHash outperforms state-of-the-art cross-modal hashing methods significantly, establishing a new important baseline for the cross-modal hashing research. Codes are released github.com/idealwhite/Unihash.
KW - contrastive learning
KW - cross-modal hashing
KW - cross-modal retrieval
KW - vision and language
UR - https://www.scopus.com/pages/publications/85140838285
U2 - 10.1145/3511808.3557265
DO - 10.1145/3511808.3557265
M3 - 会议稿件
AN - SCOPUS:85140838285
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2158
EP - 2168
BT - CIKM 2022 - Proceedings of the 31st ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 31st ACM International Conference on Information and Knowledge Management, CIKM 2022
Y2 - 17 October 2022 through 21 October 2022
ER -