TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Multimodal Activity Recognition	UCSD-MIT Human Motion	HAMLET	F1-score	81.52	# 1
Multimodal Activity Recognition	UT-Kinect	HAMLET	Accuracy (CS)	97.56	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hamlet-a-hierarchical-multimodal-attention-1/multimodal-activity-recognition-on-ucsd-mit)](https://paperswithcode.com/sota/multimodal-activity-recognition-on-ucsd-mit?p=hamlet-a-hierarchical-multimodal-attention-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hamlet-a-hierarchical-multimodal-attention-1/multimodal-activity-recognition-on-ut-kinect)](https://paperswithcode.com/sota/multimodal-activity-recognition-on-ut-kinect?p=hamlet-a-hierarchical-multimodal-attention-1)`

HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm

International Conference on Intelligent Robots and Systems (IROS) 2020 · Md Mofijul Islam, Tariq Iqbal ·

To fluently collaborate with people, robots need the ability to recognize human activities accurately. Although modern robots are equipped with various sensors, robust human activity recognition (HAR) still remains a challenging task for robots due to difficulties related to multimodal data fusion. To address these challenges, in this work, we introduce a deep neural network-based multimodal HAR algorithm, HAMLET. HAMLET incorporates a hierarchical architecture, where the lower layer encodes spatio-temporal features from unimodal data by adopting a multi-head self-attention mechanism. We develop a novel multimodal attention mechanism for disentangling and fusing the salient unimodal features to compute the multimodal features in the upper layer. Finally, multimodal features are used in a fully connect neural-network to recognize human activities. We evaluated our algorithm by comparing its performance to several state-of-the-art activity recognition algorithms on three human activity datasets. The results suggest that HAMLET outperformed all other evaluated baselines across all datasets and metrics tested, with the highest top-1 accuracy of 95.12% and 97.45% on the UTD-MHAD [1] and the UT-Kinect [2] datasets respectively, and F1-score of 81.52% on the UCSD-MIT [3] dataset. We further visualize the unimodal and multimodal attention maps, which provide us with a tool to interpret the impact of attention mechanisms concerning HAR.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Activity Recognition

Human Activity Recognition

Datasets

UCSD Ped2

UT-Kinect

UTD-MHAD

Results from the Paper

Edit

Ranked #1 on Multimodal Activity Recognition on UCSD-MIT Human Motion

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Multimodal Activity Recognition	UCSD-MIT Human Motion	HAMLET	F1-score	81.52	# 1		Compare
Multimodal Activity Recognition	UT-Kinect	HAMLET	Accuracy (CS)	97.56	# 1		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove