TY - GEN
T1 - FIRST-SHOT UNSUPERVISED ANOMALOUS SOUND DETECTION WITH UNKNOWN ANOMALIES ESTIMATED BY METADATA-ASSISTED AUDIO GENERATION
AU - Zhang, Hejing
AU - Zhu, Qiaoxi
AU - Guan, Jian
AU - Liu, Haohe
AU - Xiao, Feiyang
AU - Tian, Jiantong
AU - Mei, Xinhao
AU - Liu, Xubo
AU - Wang, Wenwu
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availability of normal and abnormal sound data from the target machines. However, due to the lack of anomalous sound data for the target machine types, it becomes challenging when adapting the existing ASD methods to the first-shot task. In this paper, we propose a new framework for the first-shot unsupervised ASD, where metadata-assisted audio generation is used to estimate unknown anomalies, by utilising the available machine information (i.e., metadata and sound data) to fine-tune a text-to-audio generation model for generating the anomalous sounds that contain unique acoustic characteristics accounting for each different machine type. We then use the method of Time-Weighted Frequency domain audio Representation with Gaussian Mixture Model (TWFR-GMM) as the backbone to achieve the first-shot unsupervised ASD. Our proposed FS-TWFR-GMM method achieves competitive performance amongst top systems in DCASE 2023 Challenge Task 2, while requiring only 1% model parameters for detection, as validated in our experiments.
AB - First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availability of normal and abnormal sound data from the target machines. However, due to the lack of anomalous sound data for the target machine types, it becomes challenging when adapting the existing ASD methods to the first-shot task. In this paper, we propose a new framework for the first-shot unsupervised ASD, where metadata-assisted audio generation is used to estimate unknown anomalies, by utilising the available machine information (i.e., metadata and sound data) to fine-tune a text-to-audio generation model for generating the anomalous sounds that contain unique acoustic characteristics accounting for each different machine type. We then use the method of Time-Weighted Frequency domain audio Representation with Gaussian Mixture Model (TWFR-GMM) as the backbone to achieve the first-shot unsupervised ASD. Our proposed FS-TWFR-GMM method achieves competitive performance amongst top systems in DCASE 2023 Challenge Task 2, while requiring only 1% model parameters for detection, as validated in our experiments.
KW - Unsupervised learning
KW - anomalous sound detection
KW - audio generation
KW - latent diffusion model
KW - metadata
UR - https://www.scopus.com/pages/publications/85195395041
U2 - 10.1109/ICASSP48485.2024.10448451
DO - 10.1109/ICASSP48485.2024.10448451
M3 - 会议稿件
AN - SCOPUS:85195395041
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1271
EP - 1275
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Y2 - 14 April 2024 through 19 April 2024
ER -