Skip to main navigation Skip to search Skip to main content

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

  • Hui Lin
  • , Zhiheng Ma
  • , Xiaopeng Hong*
  • , Qinnan Shangguan
  • , Deyu Meng
  • *Corresponding author for this work
  • Xi'an Jiaotong University
  • Shenzhen Institute of Advanced Technology
  • Peng Cheng Laboratory
  • Harbin Institute of Technology

Research output: Contribution to journalConference articlepeer-review

Abstract

Transformer has been popular in recent crowd counting work since it breaks the limited receptive field of traditional CNNs. However, since crowd images always contain a large number of similar patches, the self-attention mechanism in Transformer tends to find a homogenized solution where the attention maps of almost all patches are identical. In this paper, we address this problem by proposing Gramformer: a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively on the basis of two different types of graphs. Firstly, an attention graph is proposed to diverse attention maps to attend to complementary information. The graph is building upon the dissimilarities between patches, modulating the attention in an anti-similarity fashion. Secondly, a feature-based centrality encoding is proposed to discover the centrality positions or importance of nodes. We encode them with a proposed centrality indices scheme to modulate the node features and similarity relationships. Extensive experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method. Code is available at https://github.com/LoraLinH/Gramformer.

Original languageEnglish
Pages (from-to)3395-3403
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume38
Issue number4
DOIs
StatePublished - 25 Mar 2024
Event38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada
Duration: 20 Feb 202427 Feb 2024

Fingerprint

Dive into the research topics of 'Gramformer: Learning Crowd Counting via Graph-Modulated Transformer'. Together they form a unique fingerprint.

Cite this