Skip to main navigation Skip to search Skip to main content

CTC-based Non-autoregressive Textless Speech-to-Speech Translation

  • Qingkai Fang
  • , Zhengrui Ma
  • , Yan Zhou
  • , Min Zhang
  • , Yang Feng*
  • *Corresponding author for this work
  • CAS - Institute of Computing Technology
  • University of Chinese Academy of Sciences
  • Soochow University
  • Chinese Academy of Sciences

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investigate the performance of CTC-based NAR models in S2ST, as these models have shown impressive results in machine translation. Experimental results demonstrate that by combining pretraining, knowledge distillation, and advanced NAR training techniques such as glancing training and non-monotonic latent alignments, CTC-based NAR models achieve translation quality comparable to the AR model, while preserving up to 26.81× decoding speedup.

Original languageEnglish
Title of host publicationThe 62nd Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationFindings of the Association for Computational Linguistics, ACL 2024
EditorsLun-Wei Ku, Andre Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages9155-9161
Number of pages7
ISBN (Electronic)9798891760998
DOIs
StatePublished - 2024
Externally publishedYes
EventFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand
Duration: 11 Aug 202416 Aug 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityHybrid, Bangkok
Period11/08/2416/08/24

Fingerprint

Dive into the research topics of 'CTC-based Non-autoregressive Textless Speech-to-Speech Translation'. Together they form a unique fingerprint.

Cite this