Skip to main navigation Skip to search Skip to main content

NRI-FGSM: An Efficient Transferable Adversarial Attack Method for Speaker Recognition System

  • Hao Tan
  • , Junjian Zhang
  • , Huan Zhang
  • , Le Wang*
  • , Yaguan Qian
  • , Zhaoquan Gu
  • *Corresponding author for this work
  • Guangzhou University
  • Peng Cheng Laboratory
  • Zhejiang University of Science and Technology

Research output: Contribution to journalConference articlepeer-review

Abstract

Deep neural network (DNN), though widely applied in Speaker Recognition Systems (SRS), is vulnerable to adversarial attacks which are hard to detect by humans. The black-box model vulnerability against adversarial attacks is crucial for the robustness of SRS, especially for latest models such as x-vector and ECAPA. The state-of-the-art transferable adversarial attack methods start with generating the adversarial audio from white-box SRS, then utilize this audio to attack the black-box SRS. However, these methods often have a lower success rate in SRS than in the image processing domain. To improve the attack performance on SRS, we propose an efficient Nesterov accelerate and RMSProp optimization based Iterative-Fast Gradient Sign Method (NRI-FGSM), which integrates the Nesterov Accelerated Gradient method and the Root Mean Squared Propagation optimization method with adaptive step size. Through extensive experiments on both close-set speaker recognition (CSR) and open-set speaker recognition (OSR) tasks, our method achieves higher attack success rates of 97.8% for CSR and 61.9% for OSR tasks than others, and meanwhile maintains a lower perturbation rate with signal-to-noise ratio (SNR) and perceptual evaluation of speech quality (PESQ) metrics. It is worth mentioning that our work is the first to attack the ECAPA SRS model successfully.

Original languageEnglish
Pages (from-to)4386-4390
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2022-September
DOIs
StatePublished - 2022
Externally publishedYes
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: 18 Sep 202222 Sep 2022

Keywords

  • nesterov accelerated gradient
  • root mean squared propagation
  • speaker recognition
  • transferable attack

Fingerprint

Dive into the research topics of 'NRI-FGSM: An Efficient Transferable Adversarial Attack Method for Speaker Recognition System'. Together they form a unique fingerprint.

Cite this