TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	Sports-1M	DeepVideo’s Slow Fusion	Clip Hit@1	41.9	# 5
Action Recognition	Sports-1M	DeepVideo’s Slow Fusion	Video hit@1	60.9	# 9
Action Recognition	Sports-1M	DeepVideo’s Slow Fusion	Video hit@5	80.2	# 9
Action Recognition	UCF101	Slow Fusion + Finetune top 3 layers	3-fold Accuracy	65.4	# 83

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/large-scale-video-classification-with-1/action-recognition-in-videos-on-sports-1m)](https://paperswithcode.com/sota/action-recognition-in-videos-on-sports-1m?p=large-scale-video-classification-with-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/large-scale-video-classification-with-1/action-recognition-in-videos-on-ucf101)](https://paperswithcode.com/sota/action-recognition-in-videos-on-ucf101?p=large-scale-video-classification-with-1)`

Large-Scale Video Classification with Convolutional Neural Networks

2014 IEEE Conference on Computer Vision and Pattern Recognition 2014 · Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei ·

Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

PDF Abstract 2014 IEEE 2014 PDF 2014 IEEE 2014 Abstract

Code

Add Remove Mark official

lRomul/ball-action-spotting

Tasks

Add Remove

Action Recognition

Classification

General Classification

Skeleton Based Action Recognition

Video Classification

Datasets

Introduced in the Paper:

Sports-1M

Used in the Paper:

ImageNet

UCF101

Results from the Paper

Add Remove

Ranked #9 on Action Recognition on Sports-1M

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	Sports-1M	DeepVideo’s Slow Fusion	Clip Hit@1	41.9	# 5	Compare
			Video hit@1	60.9	# 9	Compare
			Video hit@5	80.2	# 9	Compare
Action Recognition	UCF101	Slow Fusion + Finetune top 3 layers	3-fold Accuracy	65.4	# 83	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Large-Scale Video Classification with Convolutional Neural Networks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove