TY - GEN
T1 - Subband Dependency Modeling for Sound Event Detection
AU - Guan, Yadong
AU - Zheng, Guibin
AU - Han, Jiqing
AU - Wang, Huanliang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In the domain of sound event detection (SED), Convolutional Recurrent Neural Network (CRNN) has become the most successful architecture, which adopts Recurrent Neural Network (RNN) to model temporal dependencies from the output of Convolutional Neural Network (CNN). However, CRNN does not fully use the subband dependencies that have been proved critical for human perception of sound events. In this paper, we propose a subband dependency model (SDM) to enhance the capability of CRNN in modeling subband dependencies from the input spectrogram. To select prominent subband dependencies, we propose a novel SoftSparsemax transformation. It can select the salient parts by comparing all dependencies and further strengthen them by projecting them onto a probability simplex. Furthermore, since subband dependencies of different sound events may be prominent in different timescales, multi-timescale subband dependency is considered. The experiment results demonstrate the effectiveness of our method.
AB - In the domain of sound event detection (SED), Convolutional Recurrent Neural Network (CRNN) has become the most successful architecture, which adopts Recurrent Neural Network (RNN) to model temporal dependencies from the output of Convolutional Neural Network (CNN). However, CRNN does not fully use the subband dependencies that have been proved critical for human perception of sound events. In this paper, we propose a subband dependency model (SDM) to enhance the capability of CRNN in modeling subband dependencies from the input spectrogram. To select prominent subband dependencies, we propose a novel SoftSparsemax transformation. It can select the salient parts by comparing all dependencies and further strengthen them by projecting them onto a probability simplex. Furthermore, since subband dependencies of different sound events may be prominent in different timescales, multi-timescale subband dependency is considered. The experiment results demonstrate the effectiveness of our method.
KW - Sound event detection
KW - self-attention
KW - subband dependency
UR - https://www.scopus.com/pages/publications/85177593356
U2 - 10.1109/ICASSP49357.2023.10094694
DO - 10.1109/ICASSP49357.2023.10094694
M3 - 会议稿件
AN - SCOPUS:85177593356
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Y2 - 4 June 2023 through 10 June 2023
ER -