TY - GEN
T1 - Speech emotion recognition using multi-granularity feature fusion through auditory cognitive mechanism
AU - Xu, Cong
AU - Li, Haifeng
AU - Bo, Hongjian
AU - Ma, Lin
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - In this paper, we focus on the problems of single granularity in feature extraction, loss of temporal information and inefficient use of frame features in discrete speech emotion recognition. Firstly, preliminary cognitive mechanism of auditory emotion is explored through cognitive experiments, and then a multi-granularity fusion feature extraction method inspired by the mechanism for discrete emotional speech signals is proposed. The method can extract 3 different granularity features, including short-term dynamic features of frame granularity, dynamic features of segment granularity and long-term static features of global granularity. Finally, we use the LSTM network model to classify emotions according to the long-term and short-term characteristics of the fusion features. We implement experiment on the discrete emotion datasets of CHEAVD (CASIA Chinese Emotional Audio-Visual Database) released by the Institute of automation, China Research Academy of Sciences, and achieved improvement in recognition rate, increasing the MAP by 6.48%.
AB - In this paper, we focus on the problems of single granularity in feature extraction, loss of temporal information and inefficient use of frame features in discrete speech emotion recognition. Firstly, preliminary cognitive mechanism of auditory emotion is explored through cognitive experiments, and then a multi-granularity fusion feature extraction method inspired by the mechanism for discrete emotional speech signals is proposed. The method can extract 3 different granularity features, including short-term dynamic features of frame granularity, dynamic features of segment granularity and long-term static features of global granularity. Finally, we use the LSTM network model to classify emotions according to the long-term and short-term characteristics of the fusion features. We implement experiment on the discrete emotion datasets of CHEAVD (CASIA Chinese Emotional Audio-Visual Database) released by the Institute of automation, China Research Academy of Sciences, and achieved improvement in recognition rate, increasing the MAP by 6.48%.
KW - Auditory cognitive mechanism
KW - CNN-LSTM
KW - Multi-granularity feature fusion
KW - Speech emotion recognition
UR - https://www.scopus.com/pages/publications/85068214307
U2 - 10.1007/978-3-030-23407-2_10
DO - 10.1007/978-3-030-23407-2_10
M3 - 会议稿件
AN - SCOPUS:85068214307
SN - 9783030234065
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 117
EP - 131
BT - Cognitive Computing – ICCC 2019 - 3rd International Conference, Held as Part of the Services Conference Federation, SCF 2019, Proceedings
A2 - Xu, Ruifeng
A2 - Wang, Jianzong
A2 - Zhang, Liang-Jie
PB - Springer Verlag
T2 - 3rd International Conference on Cognitive Computing, ICCC 2019, held as part of the Services Conference Federation, SCF 2019
Y2 - 25 June 2019 through 30 June 2019
ER -