원문정보
초록
영어
In this paper, we handle the problem of human action recognition by combining covariance matrices as local spatio-temporal (ST) descriptors and local ST features extracted densely from action video. Unlike traditional methods that separately utilizing gradient-based feature and optical flow-based feature, we use covariance matrix to fuse the two types of feature. Since covariance matrices are Symmetric Positive Definite (SPD) matrices, which form a special type of Riemannian manifold. To measure the distance of SPDs while avoid computing the geodesic distance between them, covariance features are transformed to log-Euclidean covariance matrices (LECM) by matrix logarithm operation. After encoding LECM by Locality-constrained Linear Coding method, in order to provide position information to ST-LECM features, spatial pyramid is used to partition the video frames, and the average-pooling-on-absolute-value function is implemented over each sub-frames. Finally, non-linear support vector machine is used as classifier. Experiments on public human action datasets show that the proposed method obtains great improvements in recognition accuracy, in comparison to several state-of-the-art methods.
목차
1. Introduction
2. Spatio-Temporal Log-Euclidean Covariance Matrix (ST-LECM).
2.1. Log-Euclidean Framework on SPD Matrices
2.2. Spatio-Temporal Log-Euclidean Covariance Matrix (ST-LECM)
2.3. Encoding ST-LECM Features by LLC Method
3. The System Framework
4. Experiments
4.1. KTH Action Datasets
4.2. ADL Datasets
5. Conclusion
References
