TY - GEN
T1 - From seed discovery to deep reconstruction
T2 - 24th ACM Multimedia Conference, MM 2016
AU - Zhang, Yanhao
AU - Qin, Lei
AU - Huang, Qingming
AU - Yang, Kuiyuan
AU - Zhang, Jun
AU - Yao, Hongxun
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/10/1
Y1 - 2016/10/1
N2 - Although saliency prediction in crowd has been recently recognized as an essential task for video analysis, it is not comprehensively explored yet. The challenges lie in that eyefixations in crowded scenes are inherently\distinct"and\multimodal", which difiers from those in regular scenes. To this end, the existing saliency prediction schemes typically rely on hand designed features with shallow learning paradigm, which neglect the underlying characteristics of crowded scenes. In this paper, we propose a saliency prediction model dedicated for crowd videos with two novelties: 1) Distinct units are discovered using deep representation learned by a Stacked Denoising Auto-Encoder (SDAE), considering perceptual properties of crowd saliency; 2) Contrast-based saliency is measured through deep reconstruction errors in the second SDAE trained on all units excluding distinct units. A unified model is integrated for online processing crowd saliency. Extensive evaluations on two crowd video benchmark datasets demonstrate that our approach can effectively explore crowd saliency mechanism in two-stage SDAEs and achieve significantly better results than state-of-the-art methods, with robustness to parameters.
AB - Although saliency prediction in crowd has been recently recognized as an essential task for video analysis, it is not comprehensively explored yet. The challenges lie in that eyefixations in crowded scenes are inherently\distinct"and\multimodal", which difiers from those in regular scenes. To this end, the existing saliency prediction schemes typically rely on hand designed features with shallow learning paradigm, which neglect the underlying characteristics of crowded scenes. In this paper, we propose a saliency prediction model dedicated for crowd videos with two novelties: 1) Distinct units are discovered using deep representation learned by a Stacked Denoising Auto-Encoder (SDAE), considering perceptual properties of crowd saliency; 2) Contrast-based saliency is measured through deep reconstruction errors in the second SDAE trained on all units excluding distinct units. A unified model is integrated for online processing crowd saliency. Extensive evaluations on two crowd video benchmark datasets demonstrate that our approach can effectively explore crowd saliency mechanism in two-stage SDAEs and achieve significantly better results than state-of-the-art methods, with robustness to parameters.
KW - Crowd saliency
KW - Deep Auto Encoders
KW - Reconstruction errors
UR - https://www.scopus.com/pages/publications/84994632149
U2 - 10.1145/2964284.2967185
DO - 10.1145/2964284.2967185
M3 - 会议稿件
AN - SCOPUS:84994632149
T3 - MM 2016 - Proceedings of the 2016 ACM Multimedia Conference
SP - 72
EP - 76
BT - MM 2016 - Proceedings of the 2016 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
Y2 - 15 October 2016 through 19 October 2016
ER -