MuMu: Cooperative Multitask Learning-based Guided Multimodal Fusion

AAAI 2022  ·  Md Mofijul Islam, Tariq Iqbal ·

Multimodal sensors (visual, non-visual, and wearable) can provide complementary information to develop robust perception systems for recognizing activities accurately. However, it is challenging to extract robust multimodal representations due to the heterogeneous characteristics of data from multimodal sensors and disparate human activities, especially in the presence of noisy and misaligned sensor data. In this work, we propose a cooperative multitask learning-based guided multimodal fusion approach, MuMu, to extract robust multimodal representations for human activity recognition (HAR). MuMu employs an auxiliary task learning approach to extract features specific to each set of activities with shared characteristics (activity-group). MuMu then utilizes activity-group-specific features to direct our proposed Guided Multimodal Fusion Approach (GM-Fusion) for extracting complementary multimodal representations, designed as the target task. We evaluated MuMu by comparing its performance to state-of-the-art multimodal HAR approaches on three activity datasets. Our extensive experimental results suggest that MuMu outperforms all the evaluated approaches across all three datasets. Additionally, the ablation study suggests that MuMu significantly outperforms the baseline models (p<0.05), which do not use our guided multimodal fusion. Finally, the robust performance of MuMu on noisy and misaligned sensor data posits that our approach is suitable for HAR in real-world settings.



Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multimodal Activity Recognition MMAct MuMu F1-Score (Cross-Subject) 76.28 # 2
F1-Score (Cross-Session) 87.50 # 1


No methods listed for this paper. Add relevant methods here