Action Segmentation
72 papers with code • 9 benchmarks • 16 datasets
Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.
Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation
Libraries
Use these libraries to find Action Segmentation models and implementationsDatasets
Subtasks
Most implemented papers
Temporal Unet: Sample Level Human Action Recognition using WiFi
In this task, every WiFi distortion sample in the whole series should be categorized into one action, which is a critical technique in precise action localization, continuous action segmentation, and real-time action recognition.
Frontal Low-rank Random Tensors for Fine-grained Action Segmentation
In this paper, we propose an approach to representing high-order information for temporal action segmentation via a simple yet effective bilinear form.
Weakly Supervised Energy-Based Learning for Action Segmentation
This paper is about labeling video frames with action classes under weak supervision in training, where we have access to a temporal ordering of actions, but their start and end frames in training videos are unknown.
Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation
Despite the recent progress of fully-supervised action segmentation techniques, the performance is still not fully satisfactory.
SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation
In addition, the network estimates the action labels for each frame.
Learning to Segment Actions from Observation and Narration
We apply a generative segmental model of task structure, guided by narration, to action segmentation in video.
MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation
Despite the capabilities of these approaches in capturing temporal dependencies, their predictions suffer from over-segmentation errors.
Online Spatiotemporal Action Detection and Prediction via Causal Representations
In this thesis, we focus on video action understanding problems from an online and real-time processing point of view.
ActBERT: Learning Global-Local Video-Text Representations
In this paper, we introduce ActBERT for self-supervised learning of joint video-text representations from unlabeled data.
Alleviating Class-wise Gradient Imbalance for Pulmonary Airway Segmentation
Due to the small size and scattered spatial distribution of peripheral bronchi, this is hampered by severe class imbalance between foreground and background regions, which makes it challenging for CNN-based methods to parse distal small airways.