Action Parsing
2 papers with code • 0 benchmarks • 2 datasets
Action parsing is the task of, given a video or still image, assigning each frame or image a label describing the action in that frame or image.
Benchmarks
These leaderboards are used to track progress in Action Parsing
Latest papers with no code
Action parsing using context features
We propose an action parsing algorithm to parse a video sequence containing an unknown number of actions into its action segments.
Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework
Therefore, researchers start to focus on a new task, Part-level Action Parsing (PAP), which aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video.
Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing
Despite of dramatic progresses in the area of video classification research, a severe problem faced by the community is that the detailed understanding of human actions is ignored.
A Baseline Framework for Part-level Action Parsing and Action Recognition
This technical report introduces our 2nd place solution to Kinetics-TPS Track on Part-level Action Parsing in ICCV DeeperAction Workshop 2021.
SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation
However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset.
Intra- and Inter-Action Understanding via Temporal Action Parsing
Current methods for action recognition primarily rely on deep convolutional networks to derive feature embeddings of visual and motion features.
IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles
We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory.
DAP3D-Net: Where, What and How Actions Occur in Videos?
Action parsing in videos with complex scenes is an interesting but challenging task in computer vision.
Action Recognition by Hierarchical Mid-level Action Elements
Realistic videos of human actions exhibit rich spatiotemporal structures at multiple levels of granularity: an action can always be decomposed into multiple finer-grained elements in both space and time.
An Expressive Deep Model for Human Action Parsing from A Single Image
This paper aims at one newly raising task in vision and multimedia research: recognizing human actions from still images.