Based on this insight, we propose a guided calibration network, named GCNet, that explicitly leverages object shape and shading information for improved lighting estimation.
In this paper, we focus on improving the quality of face in the reconstruction and propose a novel Jointly-aligned Implicit Face Function (JIFF) that combines the merits of the implicit function based approach and model based approach.
We introduce rectification blocks to rectify features extracted by a state-of-the-art recognition model, in both spatial and channel dimensions, to minimize the distance between a masked face and its mask-free counterpart in the rectified feature space.
To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel.
We present a simple setup that allows us to alter the incident light paths before light rays enter the object by immersing the object partially in a liquid, and develop a method for recovering the object surface through reconstructing and triangulating such incident light paths.
Secondly, we conduct more sophisticated alignment and temporal fusion in the feature space of the coarse HDR video to produce better reconstruction.
This paper addresses the problem of mirror surface reconstruction, and proposes a solution based on observing the reflections of a moving reference plane on the mirror surface.
Visualization of the attention maps shows that our spatial attention network can capture the key face structures well even for very low resolution faces (e. g., $16\times16$).
Compared with previous networks, the proposed PSFR-GAN makes full use of the semantic (parsing maps) and pixel (LQ images) space information from different scales of input pairs.
Ranked #3 on Blind Face Restoration on CelebA-Test
We utilize a fully convolutional neural network (FCNN) to create the content image, and propose a style transfer approach to introduce textures and shadings based on a newly proposed pyramid column feature.
To deal with the uncalibrated scenario where light directions are unknown, we introduce a new convolutional network, named LCNet, to estimate light directions from input images.
To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features.
In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video.
In this paper, we formulate transparent object matting as a refractive flow estimation problem, and propose a deep learning framework, called TOM-Net, for learning the refractive flow.
In this paper, we address a novel task, namely weakly-supervised spatio-temporally grounding natural sentence in video.
We propose a novel scale aware feature encoder (SAFE) that is designed specifically for encoding characters with different scales.
Instead of supervising the network with ground truth sketches, we first perform patch matching in feature space between the input photo and photos in a small reference set of photo-sketch pairs.
Ranked #1 on Face Sketch Synthesis on CUHK
Unlike previous work which employed a global spatial transformer network to rectify the entire distorted text image, we take an approach of detecting and rectifying individual characters.
This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category.
Unlike previous approaches which require tedious work to calibrate the camera, our method can recover both the camera intrinsics and extrinsics together with the mirror surface from reflections of the reference plane under at least three unknown distinct poses.
In this paper, we develop a fixed viewpoint approach for dense surface reconstruction of transparent objects based on refraction of light.