In particular, we 1) propose a bifurcated backbone strategy (BBS) to split the multi-level features into teacher and student features, and 2) utilize a depth-enhanced module (DEM) to excavate informative parts of depth cues from the channel and spatial views.
Automatically predicting the emotions of user-generated videos (UGVs) receives increasing interest recently.
Ranked #3 on Video Emotion Recognition on Ekman6
Pre-training of deep convolutional neural networks (DCNNs) plays a crucial role in the field of visual sentiment analysis (VSA).
The distribution is generated from the latest data stored in the memory bank, which can adaptively model the difference of semantic similarity between sarcastic and non-sarcastic data.
In this tutorial, we discuss several key aspects of multi-modal emotion recognition (MER).
Images can convey rich semantics and induce various emotions in viewers.
To reduce annotation labor associated with object detection, an increasing number of studies focus on transferring the learned knowledge from a labeled source domain to another unlabeled target domain.
Recently, extensive research efforts have been dedicated to understanding the emotions of images.
First, we generate an adapted domain to align the source and target domains on the pixel-level by improving CycleGAN with a multi-scale structured cycle-consistency loss.
C-CycleGAN transfers source samples at instance-level to an intermediate domain that is closer to the target domain with sentiment semantics preserved and without losing discriminative features.
In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space.
In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS).
Ranked #2 on RGB-D Salient Object Detection on RGBD135
Emotion recognition in user-generated videos plays an important role in human-centered computing.
Ranked #4 on Video Emotion Recognition on Ekman6
In the second step, we integrate the local edge information and global location information to obtain the salient edge features.
In this paper we propose a unified framework to simultaneously discover the number of clusters and group the data points into them using subspace clustering.
In this work, we first systematically study the built-in gap between the web and standard datasets, i. e. different data distributions between the two kinds of data.
The recent years have witnessed significant growth in constructing robust generative models to capture informative distributions of natural data.
The second branch utilizes both the holistic and localized information by coupling the sentiment map with deep features for robust classification.
Accordingly, we design six medical representations considering different criteria for the recognition of skin lesions, and construct a diagnosis system for clinical skin disease images.