TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	EXTRA DATA	REMOVE
Action Detection	Charades	Sigurdsson et al.	mAP	9.6	# 16
Action Classification	Charades	Asyn-TF	MAP	22.4	# 45

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asynchronous-temporal-fields-for-action/action-detection-on-charades)](https://paperswithcode.com/sota/action-detection-on-charades?p=asynchronous-temporal-fields-for-action)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asynchronous-temporal-fields-for-action/action-classification-on-charades)](https://paperswithcode.com/sota/action-classification-on-charades?p=asynchronous-temporal-fields-for-action)`

Asynchronous Temporal Fields for Action Recognition

CVPR 2017 · Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta ·

Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. End-to-end training of such structured models is a challenging endeavor: For inference and learning we need to construct mini-batches consisting of whole videos, leading to mini-batches with only a few videos. This causes high-correlation between data points leading to breakdown of the backprop algorithm. To address this challenge, we present an asynchronous variational inference method that allows efficient end-to-end training. Our method achieves a classification mAP of 22.4% on the Charades benchmark, outperforming the state-of-the-art (17.2% mAP), and offers equal gains on the task of temporal localization.