TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	NTU RGB+D	TSMF (RGB + Pose)	Accuracy (CS)	92.5	# 14
Action Recognition	NTU RGB+D	TSMF (RGB + Pose)	Accuracy (CV)	97.4	# 12
Action Recognition In Videos	PKU-MMD	TSMF	X-Sub	95.8	# 2
Action Recognition In Videos	PKU-MMD	TSMF	X-View	97.8	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-fusion-via-teacher-student-network/action-recognition-in-videos-on-pku-mmd)](https://paperswithcode.com/sota/action-recognition-in-videos-on-pku-mmd?p=multimodal-fusion-via-teacher-student-network)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-fusion-via-teacher-student-network/action-recognition-in-videos-on-ntu-rgbd)](https://paperswithcode.com/sota/action-recognition-in-videos-on-ntu-rgbd?p=multimodal-fusion-via-teacher-student-network)`

Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition

Association for the Advancement of Artificial Intelligence (AAAI) 2021 · Bruce X.B. Yu, Yan Liu, Keith C.C. Chan ·

Indoor action recognition plays an important role in modern society, such as intelligent healthcare in large mobile cabin hospitals. With the wide usage of depth sensors like Kinect, multimodal information including skeleton and RGB modalities brings a promising way to improve the performance. However, existing methods are either focusing on a single data modality or failed to take the advantage of multiple data modalities. In this paper, we propose a Teacher-Student Multimodal Fusion (TSMF) model that fuses the skeleton and RGB modalities at the model level for indoor action recognition. In our TSMF, we utilize a teacher network to transfer the structural knowledge of the skeleton modality to a student network for the RGB modality. With extensive experiments on two benchmarking datasets: NTU RGB+D and PKU-MMD, results show that the proposed TSMF consistently performs better than state-of-the-art single modal and multimodal methods. It also indicates that our TSMF could not only improve the accuracy of the student network but also significantly improve the ensemble accuracy.

PDF Abstract

Code

Add Remove Mark official

bruceyo/TSMF

Tasks

Add Remove

Action Recognition

Action Recognition In Videos

Benchmarking

Datasets

UCF101

Kinetics

NTU RGB+D

PKU-MMD

Results from the Paper

Add Remove

Ranked #2 on Action Recognition In Videos on PKU-MMD (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	NTU RGB+D	TSMF (RGB + Pose)	Accuracy (CS)	92.5	# 14	Compare
Action Recognition	NTU RGB+D	TSMF (RGB + Pose)	Accuracy (CV)	97.4	# 12	Compare
Action Recognition In Videos	PKU-MMD	TSMF	X-Sub	95.8	# 2	Compare
Action Recognition In Videos	PKU-MMD	TSMF	X-View	97.8	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove