Video Understanding

95 papers with code • 0 benchmarks • 25 datasets

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Latest papers without code

CogME: A Novel Evaluation Metric for Video Understanding Intelligence

no code yet • 21 Jul 2021

Then we propose a top-down evaluation system for VideoQA, based on the cognitive process of humans and story elements: Cognitive Modules for Evaluation (CogME).

Question Answering Video Question Answering +1

Spatio-Temporal Context for Action Detection

no code yet • 29 Jun 2021

Research in action detection has grown in the recentyears, as it plays a key role in video understanding.

Action Detection Video Understanding

An Image Classifier Can Suffice For Video Understanding

no code yet • 26 Jun 2021

We propose a new perspective on video understanding by casting the video recognition problem as an image recognition task.

Action Recognition Video Recognition +1

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

no code yet • 21 Jun 2021

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Action Classification Image Classification +3

Long-Short Temporal Contrastive Learning of Video Transformers

no code yet • 17 Jun 2021

Our approach, named Long-Short Temporal Contrastive Learning (LSTCL), enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent.

Action Recognition Contrastive Learning +1

$C^3$: Compositional Counterfactual Constrastive Learning for Video-grounded Dialogues

no code yet • 16 Jun 2021

Video-grounded dialogue systems aim to integrate video understanding and dialogue understanding to generate responses that are relevant to both the dialogue and video context.

Contrastive Learning Dialogue Understanding +1

Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition

no code yet • 9 Jun 2021

In this paper, we present empirical results for training a stronger video vision transformer on the EPIC-KITCHENS-100 Action Recognition dataset.

Action Recognition Point Cloud Classification +1

Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking

no code yet • CVPR 2021

In this paper, we propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame, and hence may serve as a robust estimation even in challenging scenarios including occlusion.

Multi-Person Pose Estimation Multi-Person Pose Estimation and Tracking +1

Transformed ROIs for Capturing Visual Transformations in Videos

no code yet • 6 Jun 2021

Modeling the visual changes that an action brings to a scene is critical for video understanding.

Action Recognition Video Understanding

A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP

no code yet • 31 May 2021

In this paper, we rely on the network architecture of a convolutional spiking neural network trained with STDP, and we test the performance of this network when challenged with action recognition tasks.

Action Recognition Video Understanding