$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild

7 Feb 2020Yuan-Hang ZhangRulin HuangJiabei ZengShiguang ShanXilin Chen

This report describes a multi-modal multi-task ($M^3$T) approach underlying our submission to the valence-arousal estimation track of the Affective Behavior Analysis in-the-wild (ABAW) Challenge, held in conjunction with the IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020. In the proposed $M^3$T framework, we fuse both visual features from videos and acoustic features from the audio tracks to estimate the valence and arousal... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet