Skip to main navigation Skip to search Skip to main content

Reflective decoding network for image captioning

  • Lei Ke
  • , Wenjie Pei
  • , Ruiyu Li
  • , Xiaoyong Shen
  • , Yu Wing Tai
  • Hong Kong University of Science and Technology
  • Tencent

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. In this paper, we show that vocabulary coherence between words and syntactic paradigm of sentences are also important to generate high-quality image caption. Following the conventional encoder-decoder framework, we propose the Reflective Decoding Network (RDN) for image captioning, which enhances both the long-sequence dependency and position perception of words in a caption decoder. Our model learns to collaboratively attend on both visual and textual features and meanwhile perceive each word's relative position in the sentence to maximize the information delivered in the generated caption. We evaluate the effectiveness of our RDN on the COCO image captioning datasets and achieve superior performance over the previous methods. Further experiments reveal that our approach is particularly advantageous for hard cases with complex scenes to describe by captions.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision, ICCV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages8887-8896
Number of pages10
ISBN (Electronic)9781728148038
DOIs
StatePublished - Oct 2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of
Duration: 27 Oct 20192 Nov 2019

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499

Conference

Conference17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period27/10/192/11/19

Fingerprint

Dive into the research topics of 'Reflective decoding network for image captioning'. Together they form a unique fingerprint.

Cite this