Skip to main navigation Skip to search Skip to main content

Exploring attention mechanisms based on summary information for end-to-end automatic speech recognition

  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Recent studies have confirmed that attention mechanisms with location constraint strategy are helpful to reduce the misrecognition caused by incorrect alignments in attention-based end-to-end automatic speech recognition (E2E ASR) systems. The significant advantage of these mechanisms is that they consider the monotonicity of the alignment by employing a location constraint vector. This vector is directly obtained from historical attention scores for most such attention mechanisms. However, an unreasonable vector may become an additional interference when an inaccurate historical attention score occurs. Moreover, the subsequent process of attention scoring will be affected by the interference continuously. To address the problem, we obtain a reasonable location constraint vector from the matching relationship between the historical output information and the summary information, where the summary information includes content and temporal information about speech sequence. We further propose an enhanced location constrained attention mechanism, i.e., summary constrained (SC) attention mechanism, to generate the vector by a matching relationship-based neural network. We use a summary subspace embedding learned by a linear subspace projection to represent the summary information. Furthermore, considering the complementarity of the SC and typical location constrained attention mechanisms, a fused attention mechanism is used to generate a more reasonable vector by combining the two mechanisms. The SC and fused attention mechanisms-based E2E ASR systems were evaluated on a Switchboard conversational telephone speech recognition. The experimental results show that our mechanisms obtained the relative reductions of 10.6% and 16.7% in the word error rate compared with the baseline system.

Original languageEnglish
Pages (from-to)514-524
Number of pages11
JournalNeurocomputing
Volume465
DOIs
StatePublished - 20 Nov 2021

Keywords

  • End-to-end automatic speech recognition
  • Location constrained attention mechanism
  • Summary constrained attention mechanism
  • Summary information
  • Summary subspace embedding

Fingerprint

Dive into the research topics of 'Exploring attention mechanisms based on summary information for end-to-end automatic speech recognition'. Together they form a unique fingerprint.

Cite this