CERN: Confidence-Energy Recurrent Network for Group Activity Recognition

CVPR 2017  ·  Tianmin Shu, Sinisa Todorovic, Song-Chun Zhu ·

This work is about recognizing human activities occurring in videos at distinct semantic levels, including individual actions, interactions, and group activities. The recognition is realized using a two-level hierarchy of Long Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture, which can be trained end-to-end. In comparison with existing architectures of LSTMs, we make two key contributions giving the name to our approach as Confidence-Energy Recurrent Network -- CERN. First, instead of using the common softmax layer for prediction, we specify a novel energy layer (EL) for estimating the energy of our predictions. Second, rather than finding the common minimum-energy class assignment, which may be numerically unstable under uncertainty, we specify that the EL additionally computes the p-values of the solutions, and in this way estimates the most confident energy minimum. The evaluation on the Collective Activity and Volleyball datasets demonstrates: (i) advantages of our two contributions relative to the common softmax and energy-minimization formulations and (ii) a superior performance relative to the state-of-the-art approaches.

PDF Abstract CVPR 2017 PDF CVPR 2017 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Group Activity Recognition Volleyball Shu et al. Accuracy 83.6 # 11

Methods