Based on this insight, we propose a guided calibration network, named GCNet, that explicitly leverages object shape and shading information for improved lighting estimation.
We study conditional image repainting where a model is trained to generate visual content conditioned on user inputs, and composite the generated content seamlessly onto a user provided image while preserving the semantics of users' inputs.
no code implementations • 23 Jan 2022 • Tiejun Huang, Yajing Zheng, Zhaofei Yu, Rui Chen, Yuan Li, Ruiqin Xiong, Lei Ma, Junwei Zhao, Siwei Dong, Lin Zhu, Jianing Li, Shanshan Jia, Yihua Fu, Boxin Shi, Si Wu, Yonghong Tian
By treating vidar as spike trains in biological vision, we have further developed a spiking neural network-based machine vision system that combines the speed of the machine and the mechanism of biological vision, achieving high-speed object detection and tracking 1, 000x faster than human vision.
Haze, a common kind of bad weather caused by atmospheric scattering, decreases the visibility of scenes and degenerates the performance of computer vision algorithms.
Further, for training SCFlow, we synthesize two sets of optical flow data for the spiking camera, SPIkingly Flying Things and Photo-realistic High-speed Motion, denoted as SPIFT and PHM respectively, corresponding to random high-speed and well-designed scenes.
Based on this, we propose a new method that amends the label distribution of each facial image by leveraging correlations among expressions in the semantic space.
This paper studies the problem of panoramic image reflection removal, aiming at reliving the content ambiguity between reflection and transmission scenes.
EventZoom is trained in a noise-to-noise fashion where the two ends of the network are unfiltered noisy events, enforcing noise-free event restoration.
To make the problem well-posed, existing MPS methods rely on restrictive assumptions, such as shape prior, surfaces having a monochromatic with uniform albedo.
Experimental results on analytically computed, synthetic, and real-world surfaces show that our method yields accurate and stable reconstruction for both orthographic and perspective normal maps.
Mimicking the sampling mechanism of the fovea, a retina-inspired camera, named spiking camera, is developed to record the external information with a sampling rate of 40, 000 Hz, and outputs asynchronous binary spike streams.
We propose DeRenderNet, a deep neural network to decompose the albedo and latent lighting, and render shape-(in)dependent shadings, given a single image of an outdoor urban scene, trained in a self-supervised manner.
We train a deep neural network to regress intrinsic cues with physically-based constraints and use them to conduct global and local lightings estimation.
To reconstruct high-resolution intensity images from event data, we propose EvIntSR-Net that converts event data to multiple latent intensity frames to achieve super-resolution on intensity images in this paper.
For all-pixel operation, we propose the Normal Regression Network to make efficient use of the intra-image spatial information for predicting a surface normal map with rich details.
A conventional camera often suffers from over- or under-exposure when recording a real-world scene with a very high dynamic range (HDR).
In this work, we extended the contextual encoding layer that was originally designed for 2D tasks to 3D Point Cloud scenarios.
We propose RIFE, a Real-time Intermediate Flow Estimation algorithm for Video Frame Interpolation (VFI).
Ranked #3 on Video Frame Interpolation on Vimeo90K
To deal with the uncalibrated scenario where light directions are unknown, we introduce a new convolutional network, named LCNet, to estimate light directions from input images.
A graph convolutional neural network is introduced to predict the performance of architectures based on the learned representations and their relation modeled by the graph.
To promote the capability of student generator, we include a student discriminator to measure the distances between real images, and images generated by student and teacher generators.
On one hand, massive trainable parameters significantly enhance the performance of these deep networks.
In contrast, it is more reasonable to treat the generated data as unlabeled, which could be positive or negative according to their quality.
To facilitate end-to-end training, we further develop a scenario context information extraction branch to extract context information from raw RGB video directly.
Ranked #57 on Skeleton Based Action Recognition on NTU RGB+D
From a single viewpoint, we use a set of photometric stereo images to identify surface points with the same distance to the camera.
The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values.
When we take photos through glass windows or doors, the transmitted background scene is often blended with undesirable reflection.
We propose a differentiable sphere tracing algorithm to bridge the gap between inverse graphics methods and the recently proposed deep learning based implicit signed distance function.
Architectures in the population that share parameters within one SuperNet in the latest generation will be tuned over the training dataset with a few epochs.
no code implementations • 24 Jul 2019 • Shaodi You, Erqi Huang, Shuaizhe Liang, Yongrong Zheng, Yunxiang Li, Fan Wang, Sen Lin, Qiu Shen, Xun Cao, Diming Zhang, Yuanjiang Li, Yu Li, Ying Fu, Boxin Shi, Feng Lu, Yinqiang Zheng, Robby T. Tan
This document introduces the background and the usage of the Hyperspectral City Dataset and the benchmark.
Specifically, we exploit the unlabeled data to mimic the classification characteristics of giant networks, so that the original capacity can be preserved nicely.
This paper solves the Sparse Photometric stereo through Lighting Interpolation and Normal Estimation using a generative Network (SPLINE-Net).
Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors.
This paper makes a first attempt to bring the Shape from Polarization (SfP) problem to the realm of deep learning.
Temporal Video Frame Synthesis (TVFS) aims at synthesizing novel frames at timestamps different from existing frames, which has wide applications in video codec, editing and analysis.
In this way, a portable student network with significantly fewer parameters can achieve a considerable accuracy which is comparable to that of teacher network.
As compared with traditional video retargeting, stereo video retargeting poses new challenges because stereo video contains the depth information of salient objects and its time dynamics.
Removing the undesired reflections from images taken through the glass is of broad application to various computer vision tasks.
Removing undesired reflections from a photo taken in front of a glass is of great importance for enhancing the efficiency of visual computing systems.
Recent developments in the field have enabled shape recovery techniques for surfaces of various types, but an effective solution to directly estimating the surface normal in the presence of highly specular reflectance remains elusive.
Radiometrically calibrating the images from Internet photo collections brings photometric analysis from lab data to big image data in the wild, but conventional calibration methods cannot be directly applied to such image data.
As prior knowledge of objects or object features helps us make relations for similar objects on attentional tasks, pre-trained deep convolutional neural networks (CNNs) can be used to detect salient objects on images regardless of the object class is in the network knowledge or not.
Recent progress on photometric stereo extends the technique to deal with general materials and unknown illumination conditions.
We propose a framework to overcome these key challenges, allowing the benefits of polarization to be used to enhance depth maps.