Abstract
Accurate and efficient roadside cooperative perception is crucial for reducing blind spots and extending sensing ranges. However, it faces challenges in modeling long-short range cooperative dependencies and representing the heterogeneous-density distribution of cross-infrastructure data. While CNNs, Transformers, and State-Space Models have demonstrated superior performance, they inherently struggle to balance the flexibility of long-short range receptive fields with computational costs. Additionally, frequency-domain decomposition remains underutilized for heterogeneous-density data representation. In this work, we propose an innovative Asymmetric Multi-Frequency Scale-Adaptive Mamba (AsymMamba) framework, performing lightweight heterogeneous-density data decomposition to support scalable long-short range cooperative representation. First, an Asymmetric Multi-Frequency Decomposition (AsymFreq) module is designed with wavelet transforms, which unifies the spatial distribution representation of heterogeneous-density data in the frequency domain while mitigating information loss through asymmetric scale partitioning. Subsequently, AsymMamba designs a Scale-Adaptive State-Space Model (AdaSSM) module with a spatial compression and channel expansion mechanism. It not only effectively captures local short-range semantic information but also efficiently models global long-range cooperative dependencies with linear complexity. Experiments on real-world DAIR-V2X and RCooper datasets demonstrate that AsymMamba outperforms state-of-the-art methods, including the Transformer-based CoBEVT and recent Mamba-based variants. Specifically, it achieves 3.4%, 4.3%, and 0.6% 3D object detection improvements at AP@0.5 in vehicle-to-infrastructure cooperation, complex intersection, and long-range corridor roadside cooperative perception scenarios, respectively. Moreover, AsymMamba also achieves superior real-time efficiency with 4x faster inference latency than CoBEVT in a 100m sensing range, and 7x faster in a 200m long-range scenario. Code will available upon acceptance.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| DOIs | |
| State | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- Asymmetric Frequency Decomposition
- Cooperative 3D Object Detection
- Scale-Adaptive Mamba Network
Fingerprint
Dive into the research topics of 'Asymmetric Frequency-Adaptive State-Space Model for Roadside Cooperative Perception'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver