TY - GEN
T1 - MMGCN
T2 - 27th ACM International Conference on Multimedia, MM 2019
AU - Wei, Yinwei
AU - He, Xiangnan
AU - Wang, Xiang
AU - Hong, Richang
AU - Nie, Liqiang
AU - Chua, Tat Seng
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/10/15
Y1 - 2019/10/15
N2 - Personalized recommendation plays a central role in many online content sharing platforms. To provide quality micro-video recommendation service, it is of crucial importance to consider the interactions between users and items (i.e., micro-videos) as well as the item contents from various modalities (e.g., visual, acoustic, and textual). Existing works on multimedia recommendation largely exploit multi-modal contents to enrich item representations, while less effort is made to leverage information interchange between users and items to enhance user representations and further capture user's fine-grained preferences on different modalities. In this paper, we propose to exploit user-item interactions to guide the representation learning in each modality, and further personalized micro-video recommendation. We design a Multimodal Graph Convolution Network (MMGCN) framework built upon the message-passing idea of graph neural networks, which can yield modal-specific representations of users and micro-videos to better capture user preferences. Specifically, we construct a user-item bipartite graph in each modality, and enrich the representation of each node with the topological structure and features of its neighbors. Through extensive experiments on three publicly available datasets, Tiktok, Kwai, and MovieLens, we demonstrate that our proposed model is able to significantly outperform state-of-the-art multi-modal recommendation methods.
AB - Personalized recommendation plays a central role in many online content sharing platforms. To provide quality micro-video recommendation service, it is of crucial importance to consider the interactions between users and items (i.e., micro-videos) as well as the item contents from various modalities (e.g., visual, acoustic, and textual). Existing works on multimedia recommendation largely exploit multi-modal contents to enrich item representations, while less effort is made to leverage information interchange between users and items to enhance user representations and further capture user's fine-grained preferences on different modalities. In this paper, we propose to exploit user-item interactions to guide the representation learning in each modality, and further personalized micro-video recommendation. We design a Multimodal Graph Convolution Network (MMGCN) framework built upon the message-passing idea of graph neural networks, which can yield modal-specific representations of users and micro-videos to better capture user preferences. Specifically, we construct a user-item bipartite graph in each modality, and enrich the representation of each node with the topological structure and features of its neighbors. Through extensive experiments on three publicly available datasets, Tiktok, Kwai, and MovieLens, we demonstrate that our proposed model is able to significantly outperform state-of-the-art multi-modal recommendation methods.
KW - Graph Convolution Network
KW - Micro-video Understanding
KW - Multi-modal Recommendation
UR - https://www.scopus.com/pages/publications/85074834397
U2 - 10.1145/3343031.3351034
DO - 10.1145/3343031.3351034
M3 - 会议稿件
AN - SCOPUS:85074834397
T3 - MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia
SP - 1437
EP - 1445
BT - MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 21 October 2019 through 25 October 2019
ER -