Abstract
A new data selection algorithm is proposed in this paper for semi-supervised incremental learning of large vocabulary continuous speech recognition (LVCSR) system named Confirmation Based Self-Learning (CBSL). The CBSL algorithm first selects sentence level training corpus via the calculation of confidence measure, and then introduces the confirmation criterion to select word level corpus avoiding further calculation of confidence measure. It is proved that the proposed algorithm can improve the performance of acoustic model training with the highest raise by 4.42% of the recognition correctness rate in comparison with traditional single level and double level confidence measure based data selection algorithms. Besides, considering the characteristic of the distribution of high and low confidence measure data, both kinds of data are used for system training and a 1.41% increase of correctness rate is achieved by adding low confidence measure data.
| Original language | English |
|---|---|
| Pages (from-to) | 754-759 |
| Number of pages | 6 |
| Journal | Procedia Engineering |
| Volume | 29 |
| DOIs | |
| State | Published - 2012 |
| Externally published | Yes |
| Event | 2012 International Workshop on Information and Electronics Engineering, IWIEE 2012 - Harbin, China Duration: 10 Mar 2012 → 11 Mar 2012 |
Keywords
- Confirmation Based Self-Learning
- Data selection
- High and low confidence measure
- Semi-supervised incremental learning
Fingerprint
Dive into the research topics of 'Confirmation Based Self-Learning algorithm in LVCSR's semi-supervised incremental learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver