Eye gaze analysis is an important research problem in the field of Computer Vision and Human-Computer Interaction.
In this paper, we propose a novel hybrid message passing neural network with performance-driven structures (HMP-PS), which combines complementary message passing methods and captures more possible structures in a Bayesian manner.
Second, we introduce probabilistic graph convolution that allows to perform graph convolution on the distribution of Bayesian Network structure to extract AU structural features.
This paper proposes to systematically capture their dependencies and incorporate them into a deep learning framework for joint facial expression recognition and action unit detection.
Existing deep learning based facial landmark detection methods have achieved excellent performance.
Ranked #4 on Facial Landmark Detection on 300W
Knowledge graph completion (also known as relation prediction) is the task of inferring missing facts given existing ones.
In all pairs of commodity indexes, we find increased co-movements in extreme situations, a stronger dependence between energy and other commodity markets at lower tails, and a 'V-type' local dependence for the energy-metal pairs.
Affective computing (AC) of these data can help to understand human behaviors and enable wide applications.
Artificial imaging systems are introduced to select and prescriptively generate medical image data in a knowledge-driven way to utilize medical domain knowledge.
Facial action unit (AU) intensity estimation plays an important role in affective computing and human-computer interaction.
To alleviate this issue, we propose a knowledge-driven method for jointly learning multiple AU classifiers without any AU annotation by leveraging prior probabilities on AUs, including expression-independent and expression-dependent AU probabilities.
The majority of methods directly apply supervised learning techniques to AU intensity estimation while few methods exploit unlabeled samples to improve the performance.
A learning method is then proposed to perform efficient learning for the proposed model.
The major difficulty of learning and inference with deep directed models with many latent variables is the intractable inference due to the dependencies among the latent variables and the exponential number of latent variable configurations.
After that, a joint representation is extracted from the top layers of the two deep networks, and thus captures the high order dependencies between visual modality and audio modality.
Facial landmark detection, head pose estimation, and facial deformation analysis are typical facial behavior analysis tasks in computer vision.
Experimental results demonstrate that the intertwined relationships of facial action units and face shapes boost the performances of both facial action unit recognition and facial landmark detection.
Furthermore, we propose to exploit the target domain knowledge and incorporate such prior knowledge as a constraint during transfer learning to ensure that the transferred data satisfies certain properties of the target domain.
In this work, we propose a unified robust cascade regression framework that can handle both images with severe occlusion and images with large head poses.
To handle pose variations, the frontal face shape prior model is incorporated into a 3-way RBM model that could capture the relationship between frontal face shapes and non-frontal face shapes.
The corpus further includes derived features from 3D, 2D, and IR (infrared) sensors and baseline results for facial expression and action unit detection.
Then we apply structured feature selection to two applications: 1) We introduce a new method that enables STMB to scale up and show the competitive performance of our algorithms on large-scale image classification tasks.
These three levels of context provide crucial bottom-up, middle level, and top down information that can benefit the recognition task itself.
Ranked #1 on Action Recognition on VIRAT Ground 2.0
Modeling interactions of multiple co-occurring objects in a complex activity is becoming increasingly popular in the video domain.
As a result, many pairwise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks.
Spatial-temporal relations among facial muscles carry crucial information about facial expressions yet have not been thoroughly exploited.
In this work, we describe a new learning scheme for parametric learning, in which the target variables $\y$ can be modeled with a prior model $p(\y)$ and the relations between data and target variables are estimated through $p(\y)$ and a set of uncorresponded data $\x$ in training.