Action Understanding
32 papers with code • 1 benchmarks • 4 datasets
Most implemented papers
Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning
Moreover, combining these two paradigms in a naive manner leaves the synergy between them untapped and can lead to interference in training.
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors
Visual features are of vital importance for human action understanding in videos.
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e. g. makeup instructional videos.
Detailed 2D-3D Joint Representation for Human-Object Interaction
In light of these, we propose a detailed 2D-3D joint representation learning method.
LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities
Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence.
Online Spatiotemporal Action Detection and Prediction via Causal Representations
In this thesis, we focus on video action understanding problems from an online and real-time processing point of view.
Video Action Understanding
Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding.
Temporal Relational Modeling with Self-Supervision for Action Segmentation
The main reason is that large number of nodes (i. e., video frames) makes GCNs hard to capture and model temporal relations in videos.
Win-Fail Action Recognition
We introduce a first of its kind paired win-fail action understanding dataset with samples from the following domains: "General Stunts," "Internet Wins-Fails," "Trick Shots," and "Party Games."
Home Action Genome: Cooperative Compositional Action Understanding
However, there remains a lack of studies that extend action composition and leverage multiple viewpoints and multiple modalities of data for representation learning.