Skip to main navigation Skip to search Skip to main content

Graph-based referring expression comprehension with expression-guided selective filtering and noun-oriented reasoning

  • Jingcheng Ke
  • , Qi Zhang
  • , Jia Wang*
  • , Hongqing Ding
  • , Pengfei Zhang
  • , Jie Wen
  • *Corresponding author for this work
  • Guangdong University of Technology
  • City University of Macau
  • Guangdong Pharmaceutical University
  • China Mobile Communications Group Co., Ltd.
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

The objective of referring expression comprehension (REC) is to find the common feature domain between language expressions and visual objects. Due to the complex nature of modeling relationships between objects in images, graph-based methods are widely used for the REC task. However, during the process of graph construction, existing graph-based REC methods insufficiently harness the visual information associated with objects in images. Moreover, in modeling the relationships between objects, these methods consider only the relational words of the expression and the positions of the objects, while ignoring the objects themselves. Thus, they are sub-optimal in capturing underlying relationships between the objects and the expression, leading to incorrect predictions when given a complex expression. To address these issues, we propose a plug-and-adapt module called expression-guided selective and filtering module (EGSFM) for graph-based REC methods that constructs an expression-guided filter to adaptively select relevant and important visual features from feature maps of objects. Then, the selected visual object features and the textual features of the expression are jointly used for graph construction. Finally, a noun-oriented reasoning strategy is proposed for graph reasoning and target object matching, with the number of reasoning steps based on the number of nouns or noun phrases in the expression. Extensive experimental results on three challenging public datasets, including RefCOCO, RefCOCO+, and RefCOCOg, show that our method outperforms the compared graph-based methods and is robust to complex language expressions. In addition, our method performs favorably against other state-of-the-art transformer-based methods while consuming much fewer computational resources for training than those methods.

Original languageEnglish
Article number111222
JournalPattern Recognition
Volume161
DOIs
StatePublished - May 2025
Externally publishedYes

Keywords

  • Expression-guided selective and filtering module
  • Language-to-vision mapping
  • Noun-oriented reasoning
  • Referring expression comprehension

Fingerprint

Dive into the research topics of 'Graph-based referring expression comprehension with expression-guided selective filtering and noun-oriented reasoning'. Together they form a unique fingerprint.

Cite this