Paper

Action Understanding with Multiple Classes of Actors

Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor---a human adult, ignoring the diversity of actions performed by other actors. To overcome this narrow viewpoint, our paper marks the first effort in the computer vision community to jointly consider algorithmic understanding of various types of actors undergoing various actions. To begin with, we collect a large annotated Actor-Action Dataset (A2D) that consists of 3782 short videos and 31 temporally untrimmed long videos. We formulate the general actor-action understanding problem and instantiate it at various granularities: video-level single- and multiple-label actor-action recognition, and pixel-level actor-action segmentation. We propose and examine a comprehensive set of graphical models that consider the various types of interplay among actors and actions. Our findings have led us to conclusive evidence that the joint modeling of actor and action improves performance over modeling each of them independently, and further improvement can be obtained by considering the multi-scale natural in video understanding. Hence, our paper concludes the argument of the value of explicit consideration of various actors in comprehensive action understanding and provides a dataset and a benchmark for later works exploring this new problem.

Results in Papers With Code
(↓ scroll down to see all results)