Browse > Computer Vision > Video > Video Understanding

Video Understanding

14 papers with code · Computer Vision
Subtask of Video

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

CVPR 2018 tensorflow/models

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations.

ACTION LOCALIZATION ACTION RECOGNITION VIDEO UNDERSTANDING

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

30 Mar 2017chihyaoma/Activity-Recognition-with-CNN-and-RNN

Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: 1) temporal segment RNN and 2) Inception-style Temporal-ConvNet. We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.

ACTION CLASSIFICATION ACTIVITY RECOGNITION VIDEO CLASSIFICATION VIDEO UNDERSTANDING

Learnable pooling with Context Gating for video classification

21 Jun 2017antoine77340/Youtube-8M-WILLOW

Current methods for video analysis often extract frame-level features using pre-trained convolutional neural networks (CNNs). In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.

VIDEO CLASSIFICATION VIDEO UNDERSTANDING

ECO: Efficient Convolutional Network for Online Video Understanding

ECCV 2018 mzolfaghari/ECO-efficient-video-understanding

The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.

ACTION CLASSIFICATION VIDEO CAPTIONING VIDEO UNDERSTANDING

The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge

16 Jun 2017wangheda/youtube-8m

This article describes the final solution of team monkeytyping, who finished in second place in the YouTube-8M video understanding challenge. The dataset used in this challenge is a large-scale benchmark for multi-label video classification.

VIDEO CLASSIFICATION VIDEO UNDERSTANDING

Temporal Tessellation: A Unified Approach for Video Analysis

ICCV 2017 dot27/temporal-tessellation

We present a general approach to video understanding, inspired by semantic transfer techniques that have been successfully used for 2D image analysis. A test video is processed by forming correspondences between its clips and the clips of reference videos with known semantics, following which, reference semantics can be transferred to the test video.

ACTION DETECTION VIDEO CAPTIONING VIDEO SUMMARIZATION VIDEO UNDERSTANDING

Learnable Pooling Methods for Video Classification

1 Oct 2018pomonam/LearnablePoolingMethods

Rather than using ensembles of existing architectures, we provide an insight on creating new architectures. We demonstrate our solutions in the "The 2nd YouTube-8M Video Understanding Challenge", by using frame-level video and audio descriptors.

VIDEO CLASSIFICATION VIDEO UNDERSTANDING

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

HLT 2018 chitwansaharia/HACAModel

Existing multi-modal fusion methods have shown encouraging results in video understanding. Furthermore, for the first time, we validate the superior performance of the deep audio features on the video captioning task.

VIDEO CAPTIONING VIDEO UNDERSTANDING

Pooled Motion Features for First-Person Videos

CVPR 2015 mryoo/pooled_time_series

In this paper, we present a new feature representation for first-person videos. We also confirm that our feature representation has superior performance to existing state-of-the-art features like local spatio-temporal features and Improved Trajectory Features (originally developed for 3rd-person videos) when handling first-person videos.

ACTIVITY RECOGNITION TIME SERIES VIDEO UNDERSTANDING

Temporal Shift Module for Efficient Video Understanding

20 Nov 2018PaParaZz1/TemporalShiftModule

The explosive growth in online video streaming gives rise to challenges on efficiently extracting the spatial-temporal information to perform video understanding. Specifically, it can achieve the performance of 3D CNN but maintain 2D complexity.

VIDEO UNDERSTANDING