One challenge of semantic segmentation is to deal with the object scale variations and leverage the context.
The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.
Although the model is trained using only RGB image, when changing the background textures, it also performs well and can achieve even 94% accuracy on the set of adversarial objects, which outperforms current state-of-the-art methods.
Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information.
Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data.
Face detection and alignment in unconstrained environment are challenging due to various poses, illuminations and occlusions.
Ranked #9 on Face Detection on WIDER Face (Easy)
In this study, we show that landmark detection or face alignment task is not a single and independent problem.
Ranked #8 on Unsupervised Facial Landmark Detection on MAFL