TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Zero-Shot Action Recognition	HMDB51	O2A	Top-1 Accuracy	15.6	# 25
Zero-Shot Action Recognition	UCF101	O2A	Top-1 Accuracy	30.3	# 21

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/objects2action-classifying-and-localizing/zero-shot-action-recognition-on-ucf101)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-ucf101?p=objects2action-classifying-and-localizing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/objects2action-classifying-and-localizing/zero-shot-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-hmdb51?p=objects2action-classifying-and-localizing)`

Objects2action: Classifying and localizing actions without any video example

ICCV 2015 · Mihir Jain, Jan C. van Gemert, Thomas Mensink, Cees G. M. Snoek ·

The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach.