Abstract
An important problem in speech, and generally activity, recognition is to develop analyses that are invariant to the execution rates. We introduce a theoretical framework that provides a parametrization-invariant metric for comparing parametrized paths on Riemannian manifolds. Treating instances of activities as parametrized paths on a Riemannian manifold of covariance matrices, we apply this framework to the problem of visual speech recognition from image sequences. We represent each sequence as a path on the space of covariance matrices, each covariance matrix capturing spatial variability of visual features in a frame, and perform simultaneous pairwise temporal alignment and comparison of paths. This removes the temporal variability and helps provide a robust metric for visual speech classification. We evaluated this idea on the OuluVS database and the rank-1 nearest neighbor classification rate improves from 32% to 57% due to temporal alignment.
| Original language | English |
|---|---|
| DOIs | |
| State | Published - 2013 |
| Externally published | Yes |
| Event | 2013 4th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG 2013 - Jodhpur, Rajasthan, India Duration: 18 Dec 2013 → 21 Dec 2013 |
Conference
| Conference | 2013 4th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG 2013 |
|---|---|
| Country/Territory | India |
| City | Jodhpur, Rajasthan |
| Period | 18/12/13 → 21/12/13 |
Fingerprint
Dive into the research topics of 'Rate-invariant comparisons of covariance paths for visual speech recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver