TY - GEN
T1 - The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023
AU - Minghan, Wang
AU - Yinglu, Li
AU - Jiaxin, Guo
AU - Zongyao, Li
AU - Hengchao, Shang
AU - Daimeng, Wei
AU - Chang, Su
AU - Min, Zhang
AU - Shimin, Tao
AU - Hao, Yang
N1 - Publisher Copyright:
© IWSLT 2023.All rights reserved.
PY - 2023
Y1 - 2023
N2 - This paper describes our work on the IWSLT2023 Speech-to-Speech task. Our proposed cascaded system consists of an ensemble of Conformer and S2T-Transformer-based ASR models, a Transformer-based MT model, and a Diffusion-based TTS model. Our primary focus in this competition was to investigate the modeling ability of the Diffusion model for TTS tasks in high-resource scenarios and the role of TTS in the overall S2S task. To this end, we proposed DTS, an end-to-end diffusion-based TTS model that takes raw text as input and generates waveform by iteratively denoising on pure Gaussian noise. Compared to previous TTS models, the speech generated by DTS is more natural and performs better in code-switching scenarios. As the training process is end-to-end, it is relatively straightforward. Our experiments demonstrate that DTS outperforms other TTS models on the GigaS2S benchmark, and also brings positive gain for the entire S2S system.
AB - This paper describes our work on the IWSLT2023 Speech-to-Speech task. Our proposed cascaded system consists of an ensemble of Conformer and S2T-Transformer-based ASR models, a Transformer-based MT model, and a Diffusion-based TTS model. Our primary focus in this competition was to investigate the modeling ability of the Diffusion model for TTS tasks in high-resource scenarios and the role of TTS in the overall S2S task. To this end, we proposed DTS, an end-to-end diffusion-based TTS model that takes raw text as input and generates waveform by iteratively denoising on pure Gaussian noise. Compared to previous TTS models, the speech generated by DTS is more natural and performs better in code-switching scenarios. As the training process is end-to-end, it is relatively straightforward. Our experiments demonstrate that DTS outperforms other TTS models on the GigaS2S benchmark, and also brings positive gain for the entire S2S system.
UR - https://www.scopus.com/pages/publications/85174970557
M3 - 会议稿件
AN - SCOPUS:85174970557
T3 - 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference
SP - 277
EP - 282
BT - 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference
A2 - Salesky, Elizabeth
A2 - Federico, Marcello
A2 - Carpuat, Marine
PB - Association for Computational Linguistics
T2 - 20th International Conference on Spoken Language Translation, IWSLT 2023
Y2 - 13 July 2023 through 14 July 2023
ER -