TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Segmentation	50Salads	EUT	F1@10%	89.2	# 1
Action Segmentation	50Salads	EUT	F1@25%	87.5	# 1
Action Segmentation	50Salads	EUT	F1@50%	81	# 1
Action Segmentation	50Salads	EUT	Edit	82.9	# 1
Action Segmentation	50Salads	EUT	Acc	87.4	# 1
Action Segmentation	50 Salads	EUT	F1@10%	89.2	# 5
Action Segmentation	50 Salads	EUT	Edit	82.9	# 8
Action Segmentation	50 Salads	EUT	Acc	87.4	# 7
Action Segmentation	50 Salads	EUT	F1@25%	87.5	# 8
Action Segmentation	50 Salads	EUT	F1@50%	81	# 8
Action Segmentation	Breakfast	EUT	F1@10%	76.2	# 10
Action Segmentation	Breakfast	EUT	F1@50%	59.8	# 8
Action Segmentation	Breakfast	EUT	Acc	75	# 7
Action Segmentation	Breakfast	EUT	Edit	74.6	# 13
Action Segmentation	Breakfast	EUT	F1@25%	71.8	# 8
Action Segmentation	GTEA	EUT	F1@10%	88.2	# 20
Action Segmentation	GTEA	EUT	F1@50%	74	# 21
Action Segmentation	GTEA	EUT	Acc	77	# 21
Action Segmentation	GTEA	EUT	Edit	83.9	# 18
Action Segmentation	GTEA	EUT	F1@25%	87.2	# 16

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-u-transformer-with-boundary-aware/action-segmentation-on-50salads)](https://paperswithcode.com/sota/action-segmentation-on-50salads?p=efficient-u-transformer-with-boundary-aware)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-u-transformer-with-boundary-aware/action-segmentation-on-50-salads-1)](https://paperswithcode.com/sota/action-segmentation-on-50-salads-1?p=efficient-u-transformer-with-boundary-aware)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-u-transformer-with-boundary-aware/action-segmentation-on-breakfast-1)](https://paperswithcode.com/sota/action-segmentation-on-breakfast-1?p=efficient-u-transformer-with-boundary-aware)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-u-transformer-with-boundary-aware/action-segmentation-on-gtea-1)](https://paperswithcode.com/sota/action-segmentation-on-gtea-1?p=efficient-u-transformer-with-boundary-aware)`

Do we really need temporal convolutions in action segmentation?

26 May 2022 · Dazhao Du, Bing Su, Yu Li, Zhongang Qi, Lingyu Si, Ying Shan ·

Action classification has made great progress, but segmenting and recognizing actions from long untrimmed videos remains a challenging problem. Most state-of-the-art methods focus on designing temporal convolution-based models, but the inflexibility of temporal convolutions and the difficulties in modeling long-term temporal dependencies restrict the potential of these models. Transformer-based models with adaptable and sequence modeling capabilities have recently been used in various tasks. However, the lack of inductive bias and the inefficiency of handling long video sequences limit the application of Transformer in action segmentation. In this paper, we design a pure Transformer-based model without temporal convolutions by incorporating temporal sampling, called Temporal U-Transformer (TUT). The U-Transformer architecture reduces complexity while introducing an inductive bias that adjacent frames are more likely to belong to the same class, but the introduction of coarse resolutions results in the misclassification of boundaries. We observe that the similarity distribution between a boundary frame and its neighboring frames depends on whether the boundary frame is the start or end of an action segment. Therefore, we further propose a boundary-aware loss based on the distribution of similarity scores between frames from attention modules to enhance the ability to recognize boundaries. Extensive experiments show the effectiveness of our model.

PDF Abstract

Code

Add Remove Mark official

ddz16/TUT official

Tasks

Add Remove

Action Classification

Action Segmentation

Inductive Bias

Datasets

Breakfast

GTEA 50 Salads

Results from the Paper

Edit

Ranked #1 on Action Segmentation on 50Salads

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Segmentation	50Salads	EUT	F1@10%	89.2	# 1	Compare
			F1@25%	87.5	# 1	Compare
			F1@50%	81	# 1	Compare
			Edit	82.9	# 1	Compare
			Acc	87.4	# 1	Compare
Action Segmentation	50 Salads	EUT	F1@10%	89.2	# 5	Compare
			Edit	82.9	# 8	Compare
			Acc	87.4	# 7	Compare
			F1@25%	87.5	# 8	Compare
			F1@50%	81	# 8	Compare
Action Segmentation	Breakfast	EUT	F1@10%	76.2	# 10	Compare
			F1@50%	59.8	# 8	Compare
			Acc	75	# 7	Compare
			Edit	74.6	# 13	Compare
			F1@25%	71.8	# 8	Compare
Action Segmentation	GTEA	EUT	F1@10%	88.2	# 20	Compare
			F1@50%	74	# 21	Compare
			Acc	77	# 21	Compare
			Edit	83.9	# 18	Compare
			F1@25%	87.2	# 16	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Concatenated Skip Connection • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Max Pooling • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • U-Net

Edit Social Preview

Do we really need temporal convolutions in action segmentation?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove