Abstract
Deep neural network (DNN), though widely applied in Speaker Recognition Systems (SRS), is vulnerable to adversarial attacks which are hard to detect by humans. The black-box model vulnerability against adversarial attacks is crucial for the robustness of SRS, especially for latest models such as x-vector and ECAPA. The state-of-the-art transferable adversarial attack methods start with generating the adversarial audio from white-box SRS, then utilize this audio to attack the black-box SRS. However, these methods often have a lower success rate in SRS than in the image processing domain. To improve the attack performance on SRS, we propose an efficient Nesterov accelerate and RMSProp optimization based Iterative-Fast Gradient Sign Method (NRI-FGSM), which integrates the Nesterov Accelerated Gradient method and the Root Mean Squared Propagation optimization method with adaptive step size. Through extensive experiments on both close-set speaker recognition (CSR) and open-set speaker recognition (OSR) tasks, our method achieves higher attack success rates of 97.8% for CSR and 61.9% for OSR tasks than others, and meanwhile maintains a lower perturbation rate with signal-to-noise ratio (SNR) and perceptual evaluation of speech quality (PESQ) metrics. It is worth mentioning that our work is the first to attack the ECAPA SRS model successfully.
| Original language | English |
|---|---|
| Pages (from-to) | 4386-4390 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2022-September |
| DOIs | |
| State | Published - 2022 |
| Externally published | Yes |
| Event | 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sep 2022 → 22 Sep 2022 |
Keywords
- nesterov accelerated gradient
- root mean squared propagation
- speaker recognition
- transferable attack
Fingerprint
Dive into the research topics of 'NRI-FGSM: An Efficient Transferable Adversarial Attack Method for Speaker Recognition System'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver