Abstract
Both spatial and tempo-spectral information are essential for multi-channel speech enhancement, a field that has gained significant popularity in recent years. While many studies focus on improving feature extraction capabilities through unique network architectures, these approaches often prioritize raw feature learning without fully addressing how to effectively utilize the extracted features for enhanced performance. In this work, we focus on the post-extracted features and introduce a Channel-Time-Frequency Attention (CTFA) module, which allocates weights to the extracted features, aiming to enhance feature utilization and enabling the model to focus more effectively on informative features. The CTFA module is structured with three parallel attention branches - channel, time, and frequency branches - to effectively refine both spatial and tempo-spectral features. It facilitates better feature reuse by assigning greater weight to effective features, thereby improving the model's robustness. We incorporate the CTFA module into our previously proposed model and conduct an ablation study to evaluate its effectiveness. Extensive experimental results confirm the efficacy of the CTFA module, with our proposed method outperforming state-of-the-art baselines.
| Original language | English |
|---|---|
| Pages (from-to) | 44418-44427 |
| Number of pages | 10 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
Keywords
- CTFA
- Multi-channel speech enhancement
- beamforming
- deep learning
Fingerprint
Dive into the research topics of 'Channel-Time-Frequency Attention Module for Improved Multi-Channel Speech Enhancement'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver