TY - GEN
T1 - Two-pronged Strategy
T2 - 29th ACM International Conference on Multimedia, MM 2021
AU - Cui, Hui
AU - Zhu, Lei
AU - Li, Jingjing
AU - Cheng, Zhiyong
AU - Zhang, Zheng
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/17
Y1 - 2021/10/17
N2 - Hashing learns compact binary codes to store and retrieve massive data efficiently. Particularly, unsupervised deep hashing is supported by powerful deep neural networks and has the desirable advantage of label independence. It is a promising technique for scalable image retrieval. However, deep models introduce a large number of parameters, which is hard to optimize due to the lack of explicit semantic labels and brings considerable training cost. As a result, the retrieval accuracy and training efficiency of existing unsupervised deep hashing are still limited. To tackle the problems, in this paper, we propose a simple and efficient Lightweight Augmented Graph Network Hashing (LAGNH) method with a two-pronged strategy. For one thing, we extract the inner structure of the image as the auxiliary semantics to enhance the semantic supervision of the unsupervised hash learning process. For another, we design a lightweight network structure with the assistance of the auxiliary semantics, which greatly reduces the number of network parameters that needs to be optimized and thus greatly accelerates the training process. Specifically, we design a cross-modal attention module based on the auxiliary semantic information to adaptively mitigate the adverse effects in the deep image features. Besides, the hash codes are learned by multi-layer message passing within an adversarial regularized graph convolutional network. Simultaneously, the semantic representation capability of hash codes is further enhanced by reconstructing the similarity graph. Experimental results show that our method achieves significant performance improvement compared with the state-of-the-art unsupervised deep hashing methods in terms of both retrieval accuracy and efficiency. Notably, on MS-COCO dataset, our method achieves more than 10% improvement on retrieval precision and 2.7x speedup on training time compared with the second best result.
AB - Hashing learns compact binary codes to store and retrieve massive data efficiently. Particularly, unsupervised deep hashing is supported by powerful deep neural networks and has the desirable advantage of label independence. It is a promising technique for scalable image retrieval. However, deep models introduce a large number of parameters, which is hard to optimize due to the lack of explicit semantic labels and brings considerable training cost. As a result, the retrieval accuracy and training efficiency of existing unsupervised deep hashing are still limited. To tackle the problems, in this paper, we propose a simple and efficient Lightweight Augmented Graph Network Hashing (LAGNH) method with a two-pronged strategy. For one thing, we extract the inner structure of the image as the auxiliary semantics to enhance the semantic supervision of the unsupervised hash learning process. For another, we design a lightweight network structure with the assistance of the auxiliary semantics, which greatly reduces the number of network parameters that needs to be optimized and thus greatly accelerates the training process. Specifically, we design a cross-modal attention module based on the auxiliary semantic information to adaptively mitigate the adverse effects in the deep image features. Besides, the hash codes are learned by multi-layer message passing within an adversarial regularized graph convolutional network. Simultaneously, the semantic representation capability of hash codes is further enhanced by reconstructing the similarity graph. Experimental results show that our method achieves significant performance improvement compared with the state-of-the-art unsupervised deep hashing methods in terms of both retrieval accuracy and efficiency. Notably, on MS-COCO dataset, our method achieves more than 10% improvement on retrieval precision and 2.7x speedup on training time compared with the second best result.
KW - attention mechanism
KW - graph neural networks
KW - image retrieval
KW - similarity preservation
KW - unsupervised deep hashing
UR - https://www.scopus.com/pages/publications/85119327605
U2 - 10.1145/3474085.3475605
DO - 10.1145/3474085.3475605
M3 - 会议稿件
AN - SCOPUS:85119327605
T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
SP - 1432
EP - 1440
BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 20 October 2021 through 24 October 2021
ER -