TY - GEN
T1 - Multi-view common space learning for emotion recognition in the wild
AU - Wu, Jianlong
AU - Lin, Zhouchen
AU - Zha, Hongbin
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/10/31
Y1 - 2016/10/31
N2 - It is a very challenging task to recognize emotion in the wild. Recently, combining information from various views or modalities has attracted more attention. Cross modality features and features extracted by different methods are regarded as multi-view information of the sample. In this paper, we propose a method to analyse multi-view features of emotion samples and automatically recognize the expression as part of the fourth Emotion Recognition in the Wild Challenge (EmotiW 2016). In our method, we first extract multi-view features such as BoF, CNN, LBP-TOP and audio features for each expression sample. Then we learn the corresponding projection matrices to map multi-view features into a common subspace. In the meantime, we impose 2;1-norm penalties on projection matrices for feature selection. We apply both this method and PLSR to emotion recognition. We conduct experiments on both AFEW and HAPPEI datasets, and achieve superior performance. The best recognition accuracy of our method is 55:31% on the AFEW dataset for video based emotion recognition in the wild. The minimum RMSE for group happiness intensity recognition is 0.9525 on HAPPEI dataset. Both of them are much better than that of the challenge baseline.
AB - It is a very challenging task to recognize emotion in the wild. Recently, combining information from various views or modalities has attracted more attention. Cross modality features and features extracted by different methods are regarded as multi-view information of the sample. In this paper, we propose a method to analyse multi-view features of emotion samples and automatically recognize the expression as part of the fourth Emotion Recognition in the Wild Challenge (EmotiW 2016). In our method, we first extract multi-view features such as BoF, CNN, LBP-TOP and audio features for each expression sample. Then we learn the corresponding projection matrices to map multi-view features into a common subspace. In the meantime, we impose 2;1-norm penalties on projection matrices for feature selection. We apply both this method and PLSR to emotion recognition. We conduct experiments on both AFEW and HAPPEI datasets, and achieve superior performance. The best recognition accuracy of our method is 55:31% on the AFEW dataset for video based emotion recognition in the wild. The minimum RMSE for group happiness intensity recognition is 0.9525 on HAPPEI dataset. Both of them are much better than that of the challenge baseline.
KW - Common space learning
KW - Emotion recognition
KW - Emotiw 2016 challenge
KW - Multi-view learning
UR - https://www.scopus.com/pages/publications/85016626752
U2 - 10.1145/2993148.2997631
DO - 10.1145/2993148.2997631
M3 - 会议稿件
AN - SCOPUS:85016626752
T3 - ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction
SP - 464
EP - 471
BT - ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction
A2 - Pelachaud, Catherine
A2 - Nakano, Yukiko I.
A2 - Nishida, Toyoaki
A2 - Busso, Carlos
A2 - Morency, Louis-Philippe
A2 - Andre, Elisabeth
PB - Association for Computing Machinery, Inc
T2 - 18th ACM International Conference on Multimodal Interaction, ICMI 2016
Y2 - 12 November 2016 through 16 November 2016
ER -