Skip to main navigation Skip to search Skip to main content

Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition

  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalConference articlepeer-review

Abstract

Speaker adaptation has been widely studied to solve the mismatch between training and test conditions for end-to-end automatic speech recognition (ASR). A key challenge of speaker adaptation is lack of sufficient annotated target-speaker data. Considering the training set is always a large-scale one and contains various speakers, it is likely that utterances in the training set can have similar voice characters with the target speaker, and naturally those similar utterances can be treated as a supplement for target speaker data in the adaptation process. Therefore, we propose personality-aware training (PAT) framework to adapt a pre-trained ASR to the target speaker. In PAT, the small-scale target speaker data is viewed as anchors, and the losses of training samples are re-weighted according to the voice character similarity between the anchors and training samples, where the voice character similarity is derived from the speaker or prosody embedding extractor. Experiments on KeSpeech and MagicData corpora show that, compared with the unadapted system, the proposed method achieves 6.35% and 11.86% relative reduction on character error rate with only 10-minute pseudo-label and true-label adaptation data, respectively.

Original languageEnglish
Pages (from-to)1249-1253
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Externally publishedYes
Event24th Annual conference of the International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • Personality-aware
  • personalization
  • speaker adaptation
  • speech recognition

Fingerprint

Dive into the research topics of 'Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition'. Together they form a unique fingerprint.

Cite this