Skip to main navigation Skip to search Skip to main content

SlotFusion: Object-Centric Audiovisual Feature Fusion with Slot Attention for Remote Sensing Scene Recognition

  • Fangzhou Han
  • , Tianyi Yu
  • , Lamei Zhang*
  • , Lingyu Si
  • , Yiqi Zhang
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • CAS - Institute of Software

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Despite significant advancements in remote sensing multimodal learning, particularly in image-image feature fusion, the exploration of audio-image feature fusion remains insufficient. Given the complexity and redundancy of ground objects in remote sensing images, accurately aligning audio features with image features during the fusion process is a critical challenge. In this paper, we introduce an object-centric feature fusion method named SlotFusion. By employing a slot attention-based feature decoupling module and a slot-based audiovisual feature fusion module, we transform modality features with complex semantic information into a set of slot features corresponding to object units and use gated activation units to adaptively implement object-centric feature fusion. Experiments on the Audio Visual Aerial Scene Recognition dataset (ADVANCE) demonstrate that the proposed SlotFusion significantly improves remote sensing scene recognition performance, with a 7.04% increase in overall accuracy compared to previous methods, achieving state-of-the-art results.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 6 Apr 202511 Apr 2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period6/04/2511/04/25

Keywords

  • Audiovisual feature fusion
  • Gated feature fusion
  • Object-centric learning
  • Slot attention

Fingerprint

Dive into the research topics of 'SlotFusion: Object-Centric Audiovisual Feature Fusion with Slot Attention for Remote Sensing Scene Recognition'. Together they form a unique fingerprint.

Cite this