Poselet Key-Framing: A Model for Human Activity Recognition

CVPR 2013  ·  Michalis Raptis, Leonid Sigal ·

In this paper, we develop a new model for recognizing human actions. An action is modeled as a very sparse sequence of temporally local discriminative keyframes collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of keyframes in a max-margin discriminative framework, where we treat keyframes as latent variables. This allows us to (jointly) learn a set of most discriminative keyframes while also learning the local temporal context between them. Keyframes are encoded using a spatially-localizable poselet-like representation with HoG and BoW components learned from weak annotations; we rely on structured SVM formulation to align our components and mine for hard negatives to boost localization performance. This results in a model that supports spatio-temporal localization and is insensitive to dropped frames or partial observations. We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

Results from Other Papers

Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Human Interaction Recognition UT Raptis et al. Accuracy 93.30 # 3


No methods listed for this paper. Add relevant methods here