TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Sign Language Recognition	RWTH-PHOENIX-Weather 2014	SMKD	Word Error Rate (WER)	20.5	# 6
Sign Language Recognition	RWTH-PHOENIX-Weather 2014 T	SMKD	Word Error Rate (WER)	22.4	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-mutual-distillation-learning-for/sign-language-recognition-on-rwth-phoenix-1)](https://paperswithcode.com/sota/sign-language-recognition-on-rwth-phoenix-1?p=self-mutual-distillation-learning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-mutual-distillation-learning-for/sign-language-recognition-on-rwth-phoenix)](https://paperswithcode.com/sota/sign-language-recognition-on-rwth-phoenix?p=self-mutual-distillation-learning-for)`

Self-Mutual Distillation Learning for Continuous Sign Language Recognition

ICCV 2021 · Aiming Hao, Yuecong Min, Xilin Chen ·

In recent years, deep learning moves video-based Continuous Sign Language Recognition (CSLR) significantly forward. Currently, a typical network combination for CSLR includes a visual module, which focuses on spatial and short-temporal information, followed by a contextual module, which focuses on long-temporal information, and the Connectionist Temporal Classification (CTC) loss is adopted to train the network. However, due to the limitation of chain rules in back-propagation, the visual module is hard to adjust for seeking optimized visual features. As a result, it enforces that the contextual module focuses on contextual information optimization only rather than balancing efficient visual and contextual information. In this paper, we propose a Self-Mutual Knowledge Distillation (SMKD) method, which enforces the visual and contextual modules to focus on short-term and long-term information and enhances the discriminative power of both modules simultaneously. Specifically, the visual and contextual modules share the weights of their corresponding classifiers, and train with CTC loss simultaneously. Moreover, the spike phenomenon widely exists with CTC loss. Although it can help us choose a few of the key frames of a gloss, it does drop other frames in a gloss and makes the visual feature saturation in the early stage. A gloss segmentation is developed to relieve the spike phenomenon and decrease saturation in the visual module. We conduct experiments on two CSLR benchmarks: PHOENIX14 and PHOENIX14-T. Experimental results demonstrate the effectiveness of the SMKD.

PDF Abstract

Code

Add Remove Mark official

ycmin95/VAC_CSLR official

110

Tasks

Add Remove

Knowledge Distillation

Sign Language Recognition

Datasets

PHOENIX14T RWTH-PHOENIX-Weather 2014 T RWTH-PHOENIX-Weather 2014

Results from the Paper

Add Remove

Ranked #5 on Sign Language Recognition on RWTH-PHOENIX-Weather 2014 T

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Sign Language Recognition	RWTH-PHOENIX-Weather 2014	SMKD	Word Error Rate (WER)	20.5	# 6	Compare
Sign Language Recognition	RWTH-PHOENIX-Weather 2014 T	SMKD	Word Error Rate (WER)	22.4	# 5	Compare

Methods

Add Remove

CTC Loss • Knowledge Distillation

Edit Social Preview

Self-Mutual Distillation Learning for Continuous Sign Language Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove