Skip to main navigation Skip to search Skip to main content

Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Sound classification models' performance suffers from generalizing on out-of-distribution (OOD) data. Numerous methods have been proposed to help the model generalize. However, most either introduce inference overheads or focus on long-lasting CNN-variants, while Transformers has been proven to outperform CNNs on numerous natural language processing and computer vision tasks. We propose FRITO, an effective regularization technique on Transformer's self-attention, to improve the model's generalization ability by limiting each sequence position's attention receptive field along the frequency dimension on the spectrogram. Experiments show that our method helps Transformer models achieve SOTA generalization performance on TAU 2020 and Nsynth datasets while saving 20% inference time.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PublisherIEEE Computer Society
Pages1104-1108
Number of pages5
ISBN (Electronic)9781665468916
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia
Duration: 10 Jul 202314 Jul 2023

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2023-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/TerritoryAustralia
CityBrisbane
Period10/07/2314/07/23

Keywords

  • Sound classification
  • acoustic scene classification
  • attention
  • domain generalization
  • transformer

Fingerprint

Dive into the research topics of 'Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer'. Together they form a unique fingerprint.

Cite this