In particular, we 1) propose a bifurcated backbone strategy (BBS) to split the multi-level features into teacher and student features, and 2) utilize a depth-enhanced module (DEM) to excavate informative parts of depth cues from the channel and spatial views.
In this tutorial, we discuss several key aspects of multi-modal emotion recognition (MER).
To reduce annotation labor associated with object detection, an increasing number of studies focus on transferring the learned knowledge from a labeled source domain to another unlabeled target domain.
Images can convey rich semantics and induce various emotions in viewers.
Recently, extensive research efforts have been dedicated to understanding the emotions of images.
First, we generate an adapted domain to align the source and target domains on the pixel-level by improving CycleGAN with a multi-scale structured cycle-consistency loss.
C-CycleGAN transfers source samples at instance-level to an intermediate domain that is closer to the target domain with sentiment semantics preserved and without losing discriminative features.
In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space.
In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS).
Ranked #1 on RGB-D Salient Object Detection on NJU2K
Emotion recognition in user-generated videos plays an important role in human-centered computing.
In the second step, we integrate the local edge information and global location information to obtain the salient edge features.
In this paper we propose a unified framework to simultaneously discover the number of clusters and group the data points into them using subspace clustering.
In this work, we first systematically study the built-in gap between the web and standard datasets, i. e. different data distributions between the two kinds of data.
The recent years have witnessed significant growth in constructing robust generative models to capture informative distributions of natural data.
The second branch utilizes both the holistic and localized information by coupling the sentiment map with deep features for robust classification.
Accordingly, we design six medical representations considering different criteria for the recognition of skin lesions, and construct a diagnosis system for clinical skin disease images.