Generic Scene Graph Generation Model with Hierarchical Prompt Learning

  • Xuhan Zhu
  • , Yifei Xing
  • , Ruiping Wang*
  • , Yaowei Wang
  • , Xiangyuan Lan*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Scene Graph Generation (SGG) delivers structured knowledge to represent complex scenes and has proven effective in many computer vision tasks. However, traditional SGG models suffer from two limitations that hinder their applicability for higher-level visual tasks: (1) a rigid structure that results in low efficiency and limited flexibility, and (2) biased optimization that results in biased predictions that favor uninformative predicates. To resolve these two issues, we propose GSGG (Generic Scene Graph Generation), a novel, efficient, and flexible SGG model that (1) combines generalized modules to construct top-performance and high-efficiency SGG model and (2) employs a prompt learning-based relation decoder with a novel Hierarchical Prompt (HP) learning method to mitigate biased optimization. HP utilizes the composition of basic prompts constrained to progressively narrowed class groups and encourages the corresponding prompts to focus on the learning of increasingly informative predicates. Extensive evaluations on three SGG benchmarks demonstrate the excellent efficiency and performance of GSGG with HP. We also introduce a novel predicate generalization task with a new benchmark, and experiments on it demonstrate the effectiveness of HP in base-to-novel predicate generalization.

Original languageEnglish
Pages (from-to)6813-6831
Number of pages19
JournalInternational Journal of Computer Vision
Volume133
Issue number10
DOIs
StatePublished - Oct 2025
Externally publishedYes

Keywords

  • Generic scene graph generation
  • Hierarchical prompt learning
  • Novel predicate generalization

Fingerprint

Dive into the research topics of 'Generic Scene Graph Generation Model with Hierarchical Prompt Learning'. Together they form a unique fingerprint.

Cite this