Leveraging the layout depth map as an intermediate representation, our proposed method outperforms existing methods for both panorama layout prediction and depth estimation.
Multimodal large-scale datasets for outdoor scenes are mostly designed for urban driving problems.
In this paper, we empirically show that existing approaches on image and sequence classifiers generalize poorly to new manipulation techniques.
In this paper the argument is made that for true novel view synthesis of objects, where the object can be synthesized from any viewpoint, an explicit 3D shape representation isdesired.
We investigate the use of photometric invariance and deep learning to compute intrinsic images (albedo and shading).
In this paper, we propose a weighting scheme based on the coefficient of variations and set the weights based on properties observed while training the model.
The experiments further show significant performance improvement of kinship verification when trained on the same dataset with more realistic distributions.
The aim is to distinguish strong photometric effects from reflectance variations.
For the quality of the image reconstruction and disparity prediction, a combination of different losses is used, including L1 image reconstruction losses and left-right disparity smoothness.
Lies and deception are common phenomena in society, both in our private and professional lives.
In this paper, we provide a synthetic data generator methodology with fully controlled, multifaceted variations based on a new 3D face dataset (3DU-Face).
In this paper, we formulate the color constancy task as an image-to-image translation problem using GANs.
There hardly exists any large-scale datasets with dense optical flow of non-rigid motion from real-world imagery as of today.
In this paper, we propose a pipeline to generate 3D point cloud of an object from a single-view RGB image.
To that end, we propose a supervised end-to-end CNN architecture to jointly learn intrinsic image decomposition and semantic segmentation.
Optical flow, semantic segmentation, and surface normals represent different information modalities, yet together they bring better cues for scene understanding problems.
On the other hand, recent research use deep learning models as in-and-out black box and do not consider the well-established, traditional image formation process as the basis of their intrinsic learning process.
Experiments on the PASCAL VOC07 and VOC10 datasets show that the proposed method significantly outperforms single object detectors, DPM (8. 4%), CN (6. 8%) and EES (17. 0%) on VOC07 and DPM (6. 5%), CN (5. 5%) and EES (16. 2%) on VOC10.
These algorithms reduce the effect of lighting variations and weather conditions by exploiting the discriminant/invariant properties of different color representations.