Skip to main navigation Skip to search Skip to main content

Paragraph generation network with visual relationship detection

  • Harbin Institute of Technology
  • Peking University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Paragraph generation of images is a new concept, aiming to produce multiple sentences to describe a given image. In this paper, we propose a paragraph generation network with introducing visual relationship detection. We first detect regions which may contain important visual objects and then predict their relationships. Paragraphs are produced based on object regions which have valid relationship with others. Compared with previous works which generate sentences based on region features, we explicitly explore and utilize visual relationships in order to improve final captions. The experimental results show that such strategy could improve paragraph generating performance from two aspects: more details about object relations are detected and more accurate sentences are obtained. Furthermore, our model is more robust to region detection fluctuation.

Original languageEnglish
Title of host publicationMM 2018 - Proceedings of the 2018 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Pages1435-1443
Number of pages9
ISBN (Electronic)9781450356657
DOIs
StatePublished - 15 Oct 2018
Event26th ACM Multimedia conference, MM 2018 - Seoul, Korea, Republic of
Duration: 22 Oct 201826 Oct 2018

Publication series

NameMM 2018 - Proceedings of the 2018 ACM Multimedia Conference

Conference

Conference26th ACM Multimedia conference, MM 2018
Country/TerritoryKorea, Republic of
CitySeoul
Period22/10/1826/10/18

Keywords

  • Image caption
  • Object detection
  • Paragraph generation
  • Relationship detection

Fingerprint

Dive into the research topics of 'Paragraph generation network with visual relationship detection'. Together they form a unique fingerprint.

Cite this