Skip to main navigation Skip to search Skip to main content

Graph-Based Group Division Network for Referring Expression Comprehension

  • Jingcheng Ke
  • , Jia Wang
  • , Waikeung Wong*
  • , Anne Toomey
  • , Jie Wen
  • *Corresponding author for this work
  • Guangdong University of Technology
  • Guangdong Pharmaceutical University
  • Hong Kong Polytechnic University
  • Laboratory for Artificial Intelligence in Design
  • Royal College of Art
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Referring expression comprehension (REC) aims at locating the target object described by an expression. We observe that most of the graph-based REC methods only focus on establishing relations between all objects in an image and the given expression during the graph construction while ignoring the relationships between objects in the same category. As a result, these methods are sub-optimal in locating the target object described by the expression, particularly when the target object is surrounded by objects of similar categories. Meanwhile, during reasoning, numerous irrelevant objects are considered for expression, which will introduce significant harmful noise. To address these issues, this paper proposes a new graph-based group division network (GBGDN). Different from the existing works, our work partitions the constructed graphs into several sub-graphs based on the categories of objects and expressions. In each sub-graph, the common visual features of objects will be strengthened through a feature enhancement strategy. Subsequently, the enhanced sub-graphs and expressions undergo joint processing via a filtering-based reasoning module designed to reduce the influence of unrelated nodes in each sub-graph, facilitating more accurate reasoning and matching. Experimental results across various datasets, including RefCOCO /+/g, Flickr30K Entities, RefClef, and Ref-reasoning, showcase the superiority of our proposed method over existing approaches. Most importantly, our method does not need pre-training.

Original languageEnglish
Pages (from-to)6170-6183
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number6
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Referring expression comprehension
  • filtering-based reasoning module
  • graph-based group division network

Fingerprint

Dive into the research topics of 'Graph-Based Group Division Network for Referring Expression Comprehension'. Together they form a unique fingerprint.

Cite this