We also show that our proposed model expresses strong generative modeling capability on ImageNet, demonstrating the superior potential of MaxViT blocks as a universal vision module.
Ranked #14 on Image Classification on ImageNet
In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks.
Ranked #1 on Deblurring on RealBlur-J (using extra training data)
Unlike existing techniques, we train a stochastic sampler that refines the output of a deterministic predictor and is capable of producing a diverse set of plausible reconstructions for a given input.
Besides the subjective ratings and content labels of the dataset, we also propose a DNN-based framework to thoroughly analyze importance of content, technical quality, and compression level in perceptual quality.
Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression.
Our algorithm augments video sequences with patch-craft frames and feeds them to a CNN.
Ranked #2 on Video Denoising on DAVIS sigma20
We showcase our proposed method with a novel denoiser architecture that achieves the reformed denoising goal and produces vivid and diverse outcomes in immoderate noise levels.
In this paper we propose the largest image compression quality dataset to date with human perceptual preferences, enabling the use of deep learning, and we develop a full reference perceptual quality assessment metric for lossy image compression that outperforms the existing state-of-the-art methods.
The first mobile camera phone was sold only 20 years ago, when taking pictures with one's phone was an oddity, and sharing pictures online was unheard of.
More explicitly, we show that in imaging applications such as denoising, super-resolution, demosaicing, deblurring and JPEG artifact removal, the proposed learning loss outperforms the current state-of-the-art on reference-based perceptual losses.
The proposed method estimates and removes mild blur from a 12MP image on a modern mobile phone in a fraction of a second.
Leveraging these realistic synthetic DP images, we introduce a recurrent convolutional network (RCN) architecture that improves deblurring results and is suitable for use with single-frame and multi-frame data (e. g., video) captured by DP sensors.
no code implementations • 10 Oct 2020 • Qifei Wang, Junjie Ke, Joshua Greaves, Grace Chu, Gabriel Bender, Luciano Sbaiz, Alec Go, Andrew Howard, Feng Yang, Ming-Hsuan Yang, Jeff Gilbert, Peyman Milanfar
This approach effectively reduces the total number of parameters and FLOPS, encouraging positive knowledge transfer while mitigating negative interference across domains.
That is to say, instead of generating an arbitrary image as a sample from the manifold of natural images, we propose to sample images from a particular "subspace" of natural images, directed by a low-resolution image from the same subspace.
We propose a realistic training data generation model for commercial satellite imagery products, which includes not only the imaging process on satellites but also the post-process on the ground.
We present a framework for interactive design of new image stylizations using a wide range of predefined filter blocks.
In this work we aim to break the unholy connection between bit-rate and image quality, and propose a way to circumvent compression artifacts by pre-editing the incoming image and modifying its content to fit the given bits.
Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image.
This work proposes a novel lightweight learnable architecture for image denoising, and presents a combination of supervised and unsupervised training of it, the first aiming for a universal denoiser and the second for adapting it to the incoming image.
In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images.
Inverse problems in imaging are extensively studied, with a variety of strategies, tools, and theory that have been accumulated over the years.
Ranked #7 on Image Super-Resolution on Set14 - 8x upscaling
In this work, we broadly connect kernel-based filtering (e. g. approaches such as the bilateral filters and nonlocal means, but also many more) with general variational formulations of Bayesian regularized least squares, and the related concept of proximal operators.
In parallel to this manual design, we propose a novel procedural approach that automatically assembles sequences of filters for innovative results.
The Rapid and Accurate Image Super Resolution (RAISR) method of Romano, Isidoro, and Milanfar is a computationally efficient image upscaling method using a trained set of filters.
Automatically learned quality assessment for images has recently become a hot topic due to its usefulness in a wide variety of applications such as evaluating image capture pipelines, storage techniques and sharing media.
Ranked #4 on Aesthetics Quality Assessment on AVA
As opposed to the $P^3$ method, we offer Regularization by Denoising (RED): using the denoising engine in defining the regularization of the inverse problem.
Pedestrian detection in thermal infrared images poses unique challenges because of the low resolution and noisy nature of the image.
Recent work on this problem adopting Convolutional Neural-networks (CNN) ignited a renewed interest in this field, due to the very impressive results obtained.
Our approach additionally includes an extremely efficient way to produce an image that is significantly sharper than the input blurry one, without introducing artifacts such as halos and noise amplification.