Skip to main navigation Skip to search Skip to main content

Asymmetric Frequency-Adaptive State-Space Model for Roadside Cooperative Perception

  • Jiaqian Wang
  • , Yiling Wu
  • , Mingkai Qiu
  • , Xiying Li*
  • , Yaowei Wang
  • *Corresponding author for this work
  • Sun Yat-Sen University
  • Peng Cheng Laboratory
  • Guangdong Key Laboratory of Intelligent Transportation Systems
  • Harbin Institute of Technology Shenzhen

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate and efficient roadside cooperative perception is crucial for reducing blind spots and extending sensing ranges. However, it faces challenges in modeling long-short range cooperative dependencies and representing the heterogeneous-density distribution of cross-infrastructure data. While CNNs, Transformers, and State-Space Models have demonstrated superior performance, they inherently struggle to balance the flexibility of long-short range receptive fields with computational costs. Additionally, frequency-domain decomposition remains underutilized for heterogeneous-density data representation. In this work, we propose an innovative Asymmetric Multi-Frequency Scale-Adaptive Mamba (AsymMamba) framework, performing lightweight heterogeneous-density data decomposition to support scalable long-short range cooperative representation. First, an Asymmetric Multi-Frequency Decomposition (AsymFreq) module is designed with wavelet transforms, which unifies the spatial distribution representation of heterogeneous-density data in the frequency domain while mitigating information loss through asymmetric scale partitioning. Subsequently, AsymMamba designs a Scale-Adaptive State-Space Model (AdaSSM) module with a spatial compression and channel expansion mechanism. It not only effectively captures local short-range semantic information but also efficiently models global long-range cooperative dependencies with linear complexity. Experiments on real-world DAIR-V2X and RCooper datasets demonstrate that AsymMamba outperforms state-of-the-art methods, including the Transformer-based CoBEVT and recent Mamba-based variants. Specifically, it achieves 3.4%, 4.3%, and 0.6% 3D object detection improvements at AP@0.5 in vehicle-to-infrastructure cooperation, complex intersection, and long-range corridor roadside cooperative perception scenarios, respectively. Moreover, AsymMamba also achieves superior real-time efficiency with 4x faster inference latency than CoBEVT in a 100m sensing range, and 7x faster in a 200m long-range scenario. Code will available upon acceptance.

Original languageEnglish
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
StateAccepted/In press - 2026
Externally publishedYes

Keywords

  • Asymmetric Frequency Decomposition
  • Cooperative 3D Object Detection
  • Scale-Adaptive Mamba Network

Fingerprint

Dive into the research topics of 'Asymmetric Frequency-Adaptive State-Space Model for Roadside Cooperative Perception'. Together they form a unique fingerprint.

Cite this