1 code implementation • Sensors 2021 • Ha Thi Phuong Thao, B T Balamurali, Gemma Roig, Dorien Herremans
The models that use all visual, audio, and text features simultaneously as their inputs performed better than those using features extracted from each modality separately.
1 code implementation • 21 Oct 2020 • Ha Thi Phuong Thao, Balamurali B. T., Dorien Herremans, Gemma Roig
In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet.
1 code implementation • 16 Sep 2019 • Ha Thi Phuong Thao, Dorien Herremans, Gemma Roig
Interestingly, we also observe that the optical flow is more informative than the RGB in videos, and overall, models using audio features are more accurate than those based on video features when making the final prediction of evoked emotions.