TY - GEN
T1 - Large Vocabulary Continuous Speech Recognition with Deep Recurrent Network
AU - Wu, Pengfei
AU - Wang, Mingjiang
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/10/23
Y1 - 2020/10/23
N2 - Speech recognition mainly refers to making the machine understand what people say, that is, in various environments, it can accurately recognize the speech content. According to the voice information, the machine can execute the intention of human. In this paper, the feature extraction algorithm of speech data set is designed. The voice data set used is thchs30, which contains 13388 voice files. The fbank feature of speech is input into the recurrent neural network for training. And, the training method is end-to-end, and the decoding result is the corresponding syllable in the dictionary. Among them, the initial and final of syllable is used as the voice label for training, and the accuracy is about 70%. After changing the mapping relationship between speech sequence and Pinyin label, about 1209 Pinyin are sorted out, and the speech features with Pinyin labels are trained. The accuracy is about 80%.
AB - Speech recognition mainly refers to making the machine understand what people say, that is, in various environments, it can accurately recognize the speech content. According to the voice information, the machine can execute the intention of human. In this paper, the feature extraction algorithm of speech data set is designed. The voice data set used is thchs30, which contains 13388 voice files. The fbank feature of speech is input into the recurrent neural network for training. And, the training method is end-to-end, and the decoding result is the corresponding syllable in the dictionary. Among them, the initial and final of syllable is used as the voice label for training, and the accuracy is about 70%. After changing the mapping relationship between speech sequence and Pinyin label, about 1209 Pinyin are sorted out, and the speech features with Pinyin labels are trained. The accuracy is about 80%.
KW - connectionist temporal classifier
KW - deep neural network
KW - recurrent neural network
KW - speech recognition
UR - https://www.scopus.com/pages/publications/85101093005
U2 - 10.1109/ICSIP49896.2020.9339455
DO - 10.1109/ICSIP49896.2020.9339455
M3 - 会议稿件
AN - SCOPUS:85101093005
T3 - 2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020
SP - 794
EP - 798
BT - 2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Signal and Image Processing, ICSIP 2020
Y2 - 23 October 2020 through 25 October 2020
ER -