Video Emotion Recognition
9 papers with code • 2 benchmarks • 5 datasets
Most implemented papers
MIMAMO Net: Integrating Micro- and Macro-motion for Video Emotion Recognition
Spatial-temporal feature learning is of vital importance for video emotion recognition.
Emotion Recognition in Audio and Video Using Deep Neural Networks
Humans are able to comprehend information from multiple domains for e. g. speech, text and visual.
Technical Report for Valence-Arousal Estimation on Affwild2 Dataset
In this work, we describe our method for tackling the valence-arousal estimation challenge from ABAW FG-2020 Competition.
FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference
In the latest social networks, more and more people prefer to express their emotions in videos through text, speech, and rich facial expressions.
Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network
Automatically predicting the emotions of user-generated videos (UGVs) receives increasing interest recently.
Towards Emotion Analysis in Short-form Videos: A Large-Scale Dataset and Baseline
The prevailing use of SVs to spread emotions leads to the necessity of conducting video emotion analysis (VEA) towards SVs.
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition
In comparison to state-of-the-art multimodal supervised learning models for dynamic emotion recognition, MultiMAE-DER enhances the weighted average recall (WAR) by 4. 41% on the RAVDESS dataset and by 2. 06% on the CREMAD.
MTCAE-DFER: Multi-Task Cascaded Autoencoder for Dynamic Facial Expression Recognition
This paper expands the cascaded network branch of the autoencoder-based multi-task learning (MTL) framework for dynamic facial expression recognition, namely Multi-Task Cascaded Autoencoder for Dynamic Facial Expression Recognition (MTCAE-DFER).
VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection
Audiovisual emotion recognition (AVER) aims to infer human emotions from nonverbal visual-audio (VA) cues, offering modality-complementary and language-agnostic advantages.