Skip to main navigation Skip to search Skip to main content

Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Grid Long Short-Term Memory (Grid-LSTM), which is consisted of three steps, i.e., two-dimensional grid splitting, local feature projection, and grid sequence modeling, has been widely used in Automatic Speech Recognition (ASR) tasks, since it has a strong time-frequency modeling ability. However, the network suffers from a serious problem that heavy computing time is always required. It can be found that the reason for this problem is in the last step, two cross-working LSTMs are employed to model time-frequency features in the grid via an analysis of its process. Thus, we try to speed up the Grid-LSTM by using a smaller grid and propose two enhanced Grid-LSTM models, i.e., Convolutional Grid-LSTM (ConvGrid-LSTM) and Multichannel ConvGrid-LSTM (MCConvGrid-LSTM) to reduce the grid size from the two dimensions of the Grid-LSTM respectively. In the frequency axis, we try to do this by using a large frequency stride and further to prevent performance loss by embedding a CNN in the Grid-LSTM. Moreover, in the time axis, we model several adjacent frames by the multichannel processing ability of CNN. Our method achieves (formula presented) relative reduction of training time and (formula presented) relative reduction of Word Error Rate (WER) for a character level End-to-End ASR task.

Original languageEnglish
Title of host publicationNeural Information Processing - 26th International Conference, ICONIP 2019, Proceedings
EditorsTom Gedeon, Kok Wai Wong, Minho Lee
PublisherSpringer
Pages718-726
Number of pages9
ISBN (Print)9783030368012
DOIs
StatePublished - 2019
Externally publishedYes
Event26th International Conference on Neural Information Processing, ICONIP 2019 - Sydney, Australia
Duration: 12 Dec 201915 Dec 2019

Publication series

NameCommunications in Computer and Information Science
Volume1143 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference26th International Conference on Neural Information Processing, ICONIP 2019
Country/TerritoryAustralia
CitySydney
Period12/12/1915/12/19

Keywords

  • Automatic Speech Recognition
  • Convolutional Neural Network
  • Grid-LSTM

Fingerprint

Dive into the research topics of 'Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition'. Together they form a unique fingerprint.

Cite this