TY - GEN
T1 - CTC-based Non-autoregressive Textless Speech-to-Speech Translation
AU - Fang, Qingkai
AU - Ma, Zhengrui
AU - Zhou, Yan
AU - Zhang, Min
AU - Feng, Yang
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investigate the performance of CTC-based NAR models in S2ST, as these models have shown impressive results in machine translation. Experimental results demonstrate that by combining pretraining, knowledge distillation, and advanced NAR training techniques such as glancing training and non-monotonic latent alignments, CTC-based NAR models achieve translation quality comparable to the AR model, while preserving up to 26.81× decoding speedup.
AB - Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investigate the performance of CTC-based NAR models in S2ST, as these models have shown impressive results in machine translation. Experimental results demonstrate that by combining pretraining, knowledge distillation, and advanced NAR training techniques such as glancing training and non-monotonic latent alignments, CTC-based NAR models achieve translation quality comparable to the AR model, while preserving up to 26.81× decoding speedup.
UR - https://www.scopus.com/pages/publications/85204474457
U2 - 10.18653/v1/2024.findings-acl.543
DO - 10.18653/v1/2024.findings-acl.543
M3 - 会议稿件
AN - SCOPUS:85204474457
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 9155
EP - 9161
BT - The 62nd Annual Meeting of the Association for Computational Linguistics
A2 - Ku, Lun-Wei
A2 - Martins, Andre
A2 - Srikumar, Vivek
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Y2 - 11 August 2024 through 16 August 2024
ER -