TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Segmentation	50 Salads	DiffAct	F1@10%	90.1	# 3
Action Segmentation	50 Salads	DiffAct	Edit	85.0	# 3
Action Segmentation	50 Salads	DiffAct	Acc	88.9	# 3
Action Segmentation	50 Salads	DiffAct	F1@25%	89.2	# 3
Action Segmentation	50 Salads	DiffAct	F1@50%	83.7	# 3
Action Segmentation	Breakfast	DiffAct	F1@10%	80.3	# 3
Action Segmentation	Breakfast	DiffAct	F1@50%	64.6	# 3
Action Segmentation	Breakfast	DiffAct	Acc	76.4	# 3
Action Segmentation	Breakfast	DiffAct	Edit	78.4	# 3
Action Segmentation	Breakfast	DiffAct	F1@25%	75.9	# 2
Action Segmentation	GTEA	DiffAct	F1@10%	92.5	# 6
Action Segmentation	GTEA	DiffAct	F1@50%	84.7	# 2
Action Segmentation	GTEA	DiffAct	Acc	82.2	# 3
Action Segmentation	GTEA	DiffAct	Edit	89.6	# 6
Action Segmentation	GTEA	DiffAct	F1@25%	91.5	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diffusion-action-segmentation/action-segmentation-on-gtea-1)](https://paperswithcode.com/sota/action-segmentation-on-gtea-1?p=diffusion-action-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diffusion-action-segmentation/action-segmentation-on-50-salads-1)](https://paperswithcode.com/sota/action-segmentation-on-50-salads-1?p=diffusion-action-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diffusion-action-segmentation/action-segmentation-on-breakfast-1)](https://paperswithcode.com/sota/action-segmentation-on-breakfast-1?p=diffusion-action-segmentation)`

Diffusion Action Segmentation

ICCV 2023 · Daochang Liu, Qiyue Li, AnhDung Dinh, Tingting Jiang, Mubarak Shah, Chang Xu ·

Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. We propose a novel framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are iteratively generated from random noise with input video features as conditions. To enhance the modeling of three striking characteristics of human actions, including the position prior, the boundary ambiguity, and the relational dependency, we devise a unified masking strategy for the conditioning inputs in our framework. Extensive experiments on three benchmark datasets, i.e., GTEA, 50Salads, and Breakfast, are performed and the proposed method achieves superior or comparable results to state-of-the-art methods, showing the effectiveness of a generative approach for action segmentation.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Segmentation

Denoising

Position

Segmentation

Datasets

Breakfast

GTEA 50 Salads

Results from the Paper

Add Remove

Ranked #2 on Action Segmentation on GTEA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Segmentation	50 Salads	DiffAct	F1@10%	90.1	# 3	Compare
			Edit	85.0	# 3	Compare
			Acc	88.9	# 3	Compare
			F1@25%	89.2	# 3	Compare
			F1@50%	83.7	# 3	Compare
Action Segmentation	Breakfast	DiffAct	F1@10%	80.3	# 3	Compare
			F1@50%	64.6	# 3	Compare
			Acc	76.4	# 3	Compare
			Edit	78.4	# 3	Compare
			F1@25%	75.9	# 2	Compare
Action Segmentation	GTEA	DiffAct	F1@10%	92.5	# 6	Compare
			F1@50%	84.7	# 2	Compare
			Acc	82.2	# 3	Compare
			Edit	89.6	# 6	Compare
			F1@25%	91.5	# 5	Compare

Methods

Add Remove

Diffusion

Edit Social Preview

Diffusion Action Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove