Skip to main navigation Skip to search Skip to main content

A KL divergence and DNN approach to cross-lingual TTS

  • Harbin Institute of Technology
  • Microsoft USA

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We propose a Kullback-Leibler divergence (KLD) and deep neural net (DNN) based approach to cross-lingual TTS (CL-TTS) training. A speaker independent DNN (SI-DNN) ASR is used to equalize the speaker difference between a source speaker in L1 and a reference speaker in L2. Two speaker dependent GMM-HMM parametric TTS systems are first trained in the respective languages. The senones sets of the two TTS are matched in the SI-DNN ASR in terms of their output posteriors distributions in KLD. The minimum KLD criterion is used to transform the senones in the source speaker's TTS (L1) to the corresponding «closest» senones in the target language (L2). The new CL-TTS thus trained has been shown to achieve high speaker similarity to the source speaker in L1 while high intelligibility and naturalness are preserved. For untranscribed source speaker's recordings, say, conversational speech, a frame mapping, instead of «senone mapping» is also proposed to achieve a high but slightly inferior CL-TTS.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5515-5519
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - 18 May 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 20 Mar 201625 Mar 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16

Keywords

  • Kullback-Leibler divergence
  • cross-lingual
  • deep neural networks
  • speech synthesis

Fingerprint

Dive into the research topics of 'A KL divergence and DNN approach to cross-lingual TTS'. Together they form a unique fingerprint.

Cite this