A detail enhancing branch is proposed to reconstruct day light-specific features from the domain-invariant representations in a residual manner, regularized by a ranking loss.
Event-based sensors, which have a response if the change of pixel intensity exceeds a triggering threshold, can capture high-speed motion with microsecond accuracy.
Low-dose CT has been a key diagnostic imaging modality to reduce the potential risk of radiation overdose to patient health.
In this paper, we propose an efficient deep neural network for image denoising based on pixel-wise classification.
To answer the questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing ALgorithms (PIPAL) dataset.
To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset.
Automatically selecting exposure bracketing (images exposed differently) is important to obtain a high dynamic range image by using multi-exposure fusion.
Recovering sharp video sequence from a motion-blurred image is highly ill-posed due to the significant loss of motion information in the blurring process.
Ranked #15 on Image Deblurring on GoPro (using extra training data)
In this work, we show that the coupled EMA teacher causes a performance bottleneck.
We present a simple and effective image super-resolution algorithm that imposes an image formation constraint on the deep neural networks via pixel substitution.
To overcome the limitation of separate optical flow estimation, we propose a Spatio-Temporal Filter Adaptive Network (STFAN) for the alignment and deblurring in a unified framework.
Ranked #4 on Deblurring on DVD (using extra training data)
Nowadays stereo cameras are more commonly adopted in emerging devices such as dual-lens smartphones and unmanned aerial vehicles.
Our method is evaluated on both real-istic and synthetic stereo image pairs, and produces supe-rior results compared to the calibrated rectification or otherself-rectification approaches
Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving.
We present an algorithm to directly solve numerous image restoration problems (e. g., image deblurring, image dehazing, image deraining, etc.).
And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects.
The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN).
Ranked #6 on Deblurring on RealBlur-R (trained on GoPro) (SSIM (sRGB) metric)
These problems usually involve the estimation of two components of the target signals: structures and details.
Artistic style transfer can be thought as a process to generate different versions of abstraction of the original image.
Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.
Ranked #17 on 3D Human Pose Estimation on MPI-INF-3DHP
By feeding real stereo pairs of different domains to stereo models pre-trained with synthetic data, we see that: i) a pre-trained model does not generalize well to the new domain, producing artifacts at boundaries and ill-posed regions; however, ii) feeding an up-sampled stereo pair leads to a disparity map with extra details.
The resulting model outperforms all the previous monocular depth estimation methods as well as the stereo block matching method in the challenging KITTI dataset by only using a small number of real training data.
Ranked #16 on Monocular Depth Estimation on KITTI Eigen split (using extra training data)
Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e. g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames.
Ranked #2 on Pose Estimation on J-HMDB
In this paper, we introduce a bilinear composition loss function to address the problem of image dehazing.
Recent advances in visual tracking showed that deep Convolutional Neural Networks (CNN) trained for image classification can be strong feature extractors for discriminative trackers.
In this paper, we proposed a novel single stage end-to-end trainable object detection network to overcome this limitation.
This task not only requires collective perception over both visual and auditory signals, the robustness to handle severe quality degradations and unconstrained content variations are also indispensable.
Automatic speaker naming is the problem of localizing as well as identifying each speaking character in a TV/movie/live show video.