Abstract
Although the advantages of large models have been widely proved in speech recognition, small models are still required in several applications due to the limited computational resources or training data. The recognition accuracy of small models has always been a challenging issue. This paper proposes a distillation method for the pruned RNN-T structure to enhance the generalization ability of small models by leveraging information from large models, where the small model shares the pruning bounds of the large model as well as the decoder and connector structures, and multi-loss fusion is used to distill. Utilizing the Chinese speech dataset Aishell-1, experimental results demonstrated that the small model distilled from pretrained large model significantly outperforms the directly trained model of the same size by a notable relative reduction of 30.4% in Character Error Rate (CER), thereby validating the effectiveness of the proposed knowledge distillation method.
| Original language | English |
|---|---|
| Pages (from-to) | 3608-3612 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
| Event | 26th Interspeech Conference 2025 - Rotterdam, Netherlands Duration: 17 Aug 2025 → 21 Aug 2025 |
Keywords
- ASR
- Pruned RNN-T
- knowledge distillation
Fingerprint
Dive into the research topics of 'Knowledge Distillation Method for Pruned RNN-T Models via Pruning Bounds Sharing and Losses Confusion'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver