Skip to main navigation Skip to search Skip to main content

Video-based cross-modal recipe retrieval

  • Da Cao
  • , Jiansheng Fang
  • , Zhiwang Yu
  • , Liqiang Nie
  • , Hanling Zhang*
  • , Qi Tian
  • *Corresponding author for this work
  • Hunan University
  • CVTE Research
  • Shandong University
  • Huawei Technologies Co., Ltd.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As a natural extension of image-based cross-modal recipe retrieval, retrieving a specific video given a recipe as the query is seldom explored. There are various temporal and spatial elements hidden in cooking videos. In addition, current image-based cross-modal recipe retrieval approaches mostly emphasize the understanding of textual and visual content independently. Such methods overlook the interaction between textual and visual content. In this work, we innovatively propose a new problem of video-based cross-modal recipe retrieval and thoroughly investigate this issue under the attention paradigm. In particular, we firstly exploit a parallel-attention network to independently learn the representations of videos and recipes. Next, a co-attention network is proposed to explicitly emphasize the cross-modal interactive features between videos and recipes. Meanwhile, a cross-modal fusion sub-network is proposed to learn both the independent and collaborative dynamics, which can enhance the associated representation of videos and recipes. Last but not the least, the embedding vectors of videos and recipes stemming from joint network are optimized with a pairwise ranking loss. Extensive experiments on a self-collected dataset have verified the effectiveness and rationality of our proposed solution.

Original languageEnglish
Title of host publicationMM 2019 - Proceedings of the 27th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages1685-1693
Number of pages9
ISBN (Electronic)9781450368896
DOIs
StatePublished - 15 Oct 2019
Externally publishedYes
Event27th ACM International Conference on Multimedia, MM 2019 - Nice, France
Duration: 21 Oct 201925 Oct 2019

Publication series

NameMM 2019 - Proceedings of the 27th ACM International Conference on Multimedia

Conference

Conference27th ACM International Conference on Multimedia, MM 2019
Country/TerritoryFrance
CityNice
Period21/10/1925/10/19

Keywords

  • Co-Attention Network
  • Cross-Modal Retrieval
  • Parallel-Attention Network
  • Recipe Retrieval
  • Video Retrieval

Fingerprint

Dive into the research topics of 'Video-based cross-modal recipe retrieval'. Together they form a unique fingerprint.

Cite this