In this paper, we circumvent this issue by presenting a joint framework for both unbounded recommendation of camera view and image composition (i. e., UNIC).
To evaluate our proposed MetaF2N, we have collected a real-world low-quality dataset with one or multiple faces in each image, and our MetaF2N achieves superior performance on both synthetic and real-world datasets.
To aggregate long-term sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability.
To empower the framework for eliminating diverse degradations, we devise a Sequence-wise Adaptive Degradation Estimator (SADE) to estimate degradation features for the input corrupted video.
In this work, we introduce a novel Referring Diffusional segmentor (Ref-Diff) for this task, which leverages the fine-grained multi-modal information from generative models.
In this paper, we propose a VQGAN-based framework (i. e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement.
The extension makes it possible to back-project the informative features, obtained by fusing features from both modalities, into their native modalities to reconstruct the multiple masked inputs.
Concretely, FedCSD introduces a class prototype similarity distillation to align the local logits with the refined global logits that are weighted by the similarity between local logits and the global prototype.
In this paper, we focus on a particular setting of learning adaptive prompts on the fly for each test sample from an unseen new domain, which is known as test-time prompt tuning (TPT).
Since the training data for the target model in a data-free black-box attack is not available, most recent schemes utilize GANs to generate data for training substitute model.
Our ICLR work advocated for enhancing transferability in adversarial examples by incorporating a Bayesian formulation into model parameters, which effectively emulates the ensemble of infinitely many deep neural networks, while, in this paper, we introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters.
With the two cascaded subtasks, DRM-IR first dynamically models the task-specific degradation based on a reference image pair and further restores the image with the collected degradation statistics.
The main challenge of BurstSR is to effectively combine the complementary information from input frames, while existing methods still struggle with it.
Thus, we further propose an indicator to measure the local similarity of models, called robustness similitude.
In this paper, we propose a Self-supervised learning framework for Dual reversed RS distortions Correction (SelfDRSC), where a DRSC network can be learned to generate a high framerate GS video only based on dual RS images with reversed distortions.
Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map.
Text-driven diffusion models have unlocked unprecedented abilities in image generation, whereas their video counterpart still lags behind due to the excessive training cost of temporal modeling.
In particular, the proposed method, named intermediate-level perturbation decay (ILPD), encourages the intermediate-level perturbation to be in an effective adversarial direction and to possess a great magnitude simultaneously.
In this work, we investigate performing semantic segmentation solely through the training on image-sentence pairs.
Federated Magnetic Resonance Imaging (MRI) reconstruction enables multiple hospitals to collaborate distributedly without aggregating local data, thereby protecting patient privacy.
And we extend the blind-spot network to a blind-neighborhood network (BNN) for providing supervision on flat areas.
Moreover, on the seemingly implausible x16 interpolation task, our method outperforms existing methods by more than 1. 5 dB in terms of PSNR.
To restrict the generative space of StyleGAN so that it obeys the structure of characters yet remains flexible in handling different font styles, we store the discrete features for each character in a codebook.
To improve the open-world segmentation ability, we leverage omni-supervised data (i. e., panoptic segmentation data, object detection data, and image-text pairs data) into training, thus enriching the open-world segmentation ability and achieving better segmentation accuracy.
In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation.
In this paper, by contrast, we opt for the diversity in substitute models and advocate to attack a Bayesian model for achieving desirable transferability.
Although deep neural networks have achieved astonishing performance in many vision tasks, existing learning-based methods are far inferior to the physical model-based solutions in extreme low-light sensor noise modeling.
Ranked #1 on Image Denoising on ELD SonyA7S2 x200
Language-guided image inpainting aims to fill the defective regions of an image under the guidance of text while keeping the non-defective regions unchanged.
The main challenge of this task is to understand the visual and linguistic content simultaneously and to find the referred object accurately among all instances in the image.
We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping.
On the other hand, alignment algorithms in existing VSR methods perform poorly for real-world videos, leading to unsatisfactory results.
To tackle the first challenge, we proposed a Degradation Relationship Index (DRI) which is defined as the mean drop rate difference in the validation loss between two models which are respectively trained using the anchor degradation and the mixture of the anchor and the auxiliary degradations.
First, in the deblurring module, a bi-directional optical flow-based deformation is introduced to tolerate spatial misalignment between deblurred and ground-truth images.
Nonetheless, visual data (e. g., images) is by default prerequisite for learning prompts in existing methods.
By learning in a collaborative manner, the deblurring and denoising tasks in our method can benefit each other.
1 code implementation • 20 Oct 2022 • Marcos V. Conde, Radu Timofte, Yibin Huang, Jingyang Peng, Chang Chen, Cheng Li, Eduardo Pérez-Pellitero, Fenglong Song, Furui Bai, Shuai Liu, Chaoyu Feng, Xiaotao Wang, Lei Lei, Yu Zhu, Chenghua Li, Yingying Jiang, Yong A, Peisong Wang, Cong Leng, Jian Cheng, Xiaoyu Liu, Zhicun Yin, Zhilu Zhang, Junyi Li, Ming Liu, WangMeng Zuo, Jun Jiang, Jinha Kim, Yue Zhang, Beiji Zou, Zhikai Zong, Xiaoxiao Liu, Juan Marín Vega, Michael Sloth, Peter Schneider-Kamp, Richard Röttger, Furkan Kınlı, Barış Özcan, Furkan Kıraç, Li Leyi, SM Nadim Uddin, Dipon Kumar Ghosh, Yong Ju Jung
Cameras capture sensor RAW images and transform them into pleasant RGB images, suitable for the human eyes, using their integrated Image Signal Processor (ISP).
Generally, it is a challenging and intractable task to improve the photo-realistic performance of blind restoration and adaptively handle the generic and specific restoration scenarios with a single unified model.
Given a class label, the language model is used to generate a full description of a scene with a target object, and the text-to-image model deployed to generate a photo-realistic image.
For better effectiveness, we divide prompts into two groups: 1) a shared prompt for the whole long-tailed dataset to learn general features and to adapt a pretrained model into target domain; and 2) group-specific prompts to gather group-specific features for the samples which have similar features and also to empower the pretrained model with discrimination ability.
Ranked #1 on Long-tail Learning on CIFAR-100-LT (ρ=100) (using extra training data)
Notably, LQ face images, which may have the same degradation process as natural images, can be robustly restored with photo-realistic textures by exploiting their strong structural priors.
To address this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification.
Ranked #3 on Training-free 3D Point Cloud Classification on ScanObjectNN (using extra training data)
To prevent loss of original information, a multi-level enhancement mechanism guides a CNN to achieve a symmetric architecture for promoting expressive ability of HGSRCNN.
In this paper, we propose a multi-stage image denoising CNN with the wavelet transform (MWDCNN) via three stages, i. e., a dynamic convolutional block (DCB), two cascaded wavelet transform and enhancement blocks (WEBs) and a residual block (RB).
In various real-world image enhancement applications, the degradations are always non-uniform or non-homogeneous and diverse, which challenges most deep networks with fixed parameters during the inference phase.
Ranked #7 on Image Dehazing on SOTS Indoor
The compact instance stream effectively improves the segmentation accuracy of the unseen pixels, while fusing two streams with the adaptive routing map leads to an overall performance boost.
Generally, with given pseudo ground-truths generated from the well-trained WSOD network, we propose a two-module iterative training algorithm to refine pseudo labels and supervise better object detector progressively.
Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality.
Contrastive learning (CL) has recently been applied to adversarial learning tasks.
In this paper, we present a novel one-shot generative domain adaption method, i. e., DiFa, for diverse generation and faithful adaptation.
In particular, the style encoder predicts the target style representation of an input image, which serves as the conditional information in the RetouchNet for retouching, while the TSFlow maps the style representation vector into a Gaussian distribution in the forward pass.
However, the heads are not uniformly covered by the sampling points in the deformable convolution, resulting in loss of head information.
In particular, we take well-trained CBDNet, NBNet, HINet, Uformer and GMSNet into denoiser pool, and a U-Net is adopted to predict pixel-wise weighting maps to fuse these denoisers.
In this paper, we present an enhanced super-resolution group CNN (ESRGCNN) with a shallow architecture by fully fusing deep and wide channel features to extract more accurate low-frequency information in terms of correlations of different channels in single image super-resolution (SISR).
The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted great attention in the machine learning community.
Second, we present popular architectures for GANs in big and small samples for image applications.
Generally, DPANet is an encoder-decoder with skip-connections, where two branches with shared parameters in the encoder are employed to extract and align deep features from left and right views, and one decoder is adopted to fuse aligned features for predicting the sharp image.
2 code implementations • 20 Apr 2022 • Ren Yang, Radu Timofte, Meisong Zheng, Qunliang Xing, Minglang Qiao, Mai Xu, Lai Jiang, Huaida Liu, Ying Chen, Youcheng Ben, Xiao Zhou, Chen Fu, Pei Cheng, Gang Yu, Junyi Li, Renlong Wu, Zhilu Zhang, Wei Shang, Zhengyao Lv, Yunjin Chen, Mingcai Zhou, Dongwei Ren, Kai Zhang, WangMeng Zuo, Pavel Ostyakov, Vyal Dmitry, Shakarim Soltanayev, Chervontsev Sergey, Zhussip Magauiya, Xueyi Zou, Youliang Yan, Pablo Navarrete Michelini, Yunhua Lu, Diankai Zhang, Shaoli Liu, Si Gao, Biao Wu, Chengjian Zheng, Xiaofeng Zhang, Kaidi Lu, Ning Wang, Thuong Nguyen Canh, Thong Bach, Qing Wang, Xiaopeng Sun, Haoyu Ma, Shijie Zhao, Junlin Li, Liangbin Xie, Shuwei Shi, Yujiu Yang, Xintao Wang, Jinjin Gu, Chao Dong, Xiaodi Shi, Chunmei Nian, Dong Jiang, Jucai Lin, Zhihuai Xie, Mao Ye, Dengyan Luo, Liuhan Peng, Shengjie Chen, Qian Wang, Xin Liu, Boyang Liang, Hang Dong, Yuhao Huang, Kai Chen, Xingbei Guo, Yujing Sun, Huilei Wu, Pengxu Wei, Yulin Huang, Junying Chen, Ik Hyun Lee, Sunder Ali Khowaja, Jiseok Yoon
This challenge includes three tracks.
Particularly, by treating all labeled data as positive samples, PU learning is leveraged to identify negative samples (i. e., outliers) from unlabeled data.
However, BiRNN is intrinsically offline because it uses backward recurrent modules to propagate from the last to current frames, which causes high latency and large memory consumption.
Combining these two new components, for the first time, we show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.
Meanwhile, diverse testing sets are also provided with different types of reflection and scenes.
Obviously, a fine-grained part-level semantic layout will benefit object details generation, and it can be roughly inferred from an object's shape.
This paper substantially extends our work published at ECCV, in which an intermediate-level attack was proposed to improve the transferability of some baseline adversarial examples.
Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored.
For this purpose, we take the telephoto image instead of an additional high-resolution image as the supervision information and select a center patch from it as the reference to super-resolve the corresponding short-focus image patch.
Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged.
Existing unpaired low-light image enhancement approaches prefer to employ the two-way GAN framework, in which two CNN generators are deployed for enhancement and degradation separately.
We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.
In crowd counting, due to the problem of laborious labelling, it is perceived intractability of collecting a new large-scale dataset which has plentiful images with large diversity in density, scene, etc.
To diminish the effect of color inconsistency in image alignment, we introduce to use a global color mapping (GCM) module to generate an initial sRGB image given the input raw image, which can keep the spatial location of the pixels unchanged, and the target sRGB image is utilized to guide GCM for converting the color towards it.
It simply encourages the variation of output caused by perturbations on different latent dimensions to be orthogonal, and the Jacobian with respect to the input is calculated to represent this variation.
A local patch network (LPNet) with global attention is proposed in this paper to detect small targets by jointly considering the global and local properties of infrared small target images.
In this paper, we defend the problem setting for improving localization performance by leveraging the bounding box regression knowledge from a well-annotated auxiliary dataset.
One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect.
In this paper, we propose a novel face recognition method, named VirFace, to effectively apply the unlabeled shallow data for face recognition.
To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks.
In this paper, we present an edge-guided learnable bidirectional attention map (Edge-LBAM) for improving image inpainting of irregular holes with several distinct merits.
In the first stage, we predict the target semantic parsing maps to eliminate the difficulties of pose transfer and further benefit the latter translation of per-region appearance style.
To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks.
In this paper, we propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a memory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution.
In this paper, we advocate adversarial training for improving the generalization ability to both unseen facial forgeries and unseen image/video qualities.
To begin with, the pre-trained denoiser is used to generate the pseudo clean images for the test images.
For learning pseudo mask generator from the auxiliary dataset, we present a bi-level optimization formulation.
Previous KD methods for object detection mostly focus on imitating deep features within the imitation regions instead of mimicking classification logit due to its inefficiency in distilling localization information and trivial improvement.
In this work, we propose a Self Sparse Generative Adversarial Network (Self-Sparse GAN) that reduces the parameter space and alleviates the zero gradient problem.
The resulting lower bound problem has the merit that it can be efficiently solved via linear assignment and low dimensional convex quadratic programming.
EFM can also be easily incorporated into existing deblurring networks, making event-driven deblurring task benefit from state-of-the-art deblurring methods.
To be specific, the reflection layer is firstly estimated due to that it generally is much simpler and is relatively easier to estimate.
1 code implementation • 10 Nov 2020 • Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, WangMeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li, Byung-Hoon Kim, Joonyoung Song, Jong Chul Ye, JaeHyun Baek, Magauiya Zhussip, Yeskendir Koishekenov, Hwechul Cho Ye, Xin Liu, Xueying Hu, Jun Jiang, Jinwei Gu, Kai Li, Pengliang Tan, Bingxin Hou
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results.
To cope with this issue, this paper presents a multi-level wavelet residual network (MWRN) architecture as well as a progressive training (PTMWRN) scheme to improve image denoising performance.
no code implementations • 25 Sep 2020 • Pengxu Wei, Hannan Lu, Radu Timofte, Liang Lin, WangMeng Zuo, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Tangxin Xie, Liang Cao, Yan Zou, Yi Shen, Jialiang Zhang, Yu Jia, Kaihua Cheng, Chenhuan Wu, Yue Lin, Cen Liu, Yunbo Peng, Xueyi Zou, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Tongtong Zhao, Shanshan Zhao, Yoseob Han, Byung-Hoon Kim, JaeHyun Baek, HaoNing Wu, Dejia Xu, Bo Zhou, Wei Guan, Xiaobo Li, Chen Ye, Hao Li, Yukai Shi, Zhijing Yang, Xiaojun Yang, Haoyu Zhong, Xin Li, Xin Jin, Yaojun Wu, Yingxue Pang, Sen Liu, Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Marie-Paule Cani, Wan-Chi Siu, Yuanbo Zhou, Rao Muhammad Umer, Christian Micheloni, Xiaofeng Cong, Rajat Gupta, Keon-Hee Ahn, Jun-Hyuk Kim, Jun-Ho Choi, Jong-Seok Lee, Feras Almasri, Thomas Vandamme, Olivier Debeir
This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020.
Existing visual object tracking usually learns a bounding-box based template to match the targets across frames, which cannot accurately learn a pixel-wise representation, thereby being limited in handling severe appearance variations.
Recent works on plug-and-play image restoration have shown that a denoiser can implicitly serve as the image prior for model-based methods to solve many inverse problems.
As for knowledge distillation, we first apply the learned noise models to clean images to synthesize a paired set of training images, and use the real noisy images and the corresponding denoising results in the first stage to form another paired set.
Despite recent advances in deep learning-based face frontalization methods, photo-realistic and illumination preserving frontal face synthesis is still challenging due to large pose and illumination discrepancy during training.
Learning an SR model with conventional pixel-wise loss usually is easily dominated by flat regions and edges, and fails to infer realistic details of complex textures.
Next, with the degraded input, we match and select the most similar component features from their corresponding dictionaries and transfer the high-quality details to the input via the proposed dictionary feature transfer (DFT) block.
The enhancement block gathers and fuses the global and local features to provide complementary information for the latter network.
Finally, the IRB uses coarse high-frequency features from the RB to learn more accurate SR features and construct a SR image.
Specifically, we dynamically construct a cross-scale graph by searching k-nearest neighboring patches in the downsampled LR image for each query patch in the LR image.
For flexible non-blind image denoising, existing deep networks usually take both noisy image and noise level map as the input to handle various noise levels with a single model.
In this work, we propose a Dark and Bright Channel Priors embedded Network (DBCPeNet) to plug the channel priors into a neural network for effective dynamic scene deblurring.
Ranked #25 on Image Deblurring on GoPro (using extra training data)
To address this issue, we propose a non-local operation for context modeling by employing the global similarity within the context.
1 code implementation • 8 May 2020 • Abdelrahman Abdelhamed, Mahmoud Afifi, Radu Timofte, Michael S. Brown, Yue Cao, Zhilu Zhang, WangMeng Zuo, Xiaoling Zhang, Jiye Liu, Wendong Chen, Changyuan Wen, Meng Liu, Shuailin Lv, Yunchao Zhang, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Xiyu Yu, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Songhyun Yu, Bumjun Park, Jechang Jeong, Shuai Liu, Ziyao Zong, Nan Nan, Chenghua Li, Zengli Yang, Long Bao, Shuangquan Wang, Dongwoon Bai, Jungwon Lee, Youngjung Kim, Kyeongha Rho, Changyeop Shin, Sungho Kim, Pengliang Tang, Yiyun Zhao, Yuqian Zhou, Yuchen Fan, Thomas Huang, Zhihao LI, Nisarg A. Shah, Wei Liu, Qiong Yan, Yuzhi Zhao, Marcin Możejko, Tomasz Latkowski, Lukasz Treszczotko, Michał Szafraniuk, Krzysztof Trojanowski, Yanhong Wu, Pablo Navarrete Michelini, Fengshuo Hu, Yunhua Lu, Sujin Kim, Wonjin Kim, Jaayeon Lee, Jang-Hwan Choi, Magauiya Zhussip, Azamat Khassenov, Jong Hyun Kim, Hwechul Cho, Priya Kansal, Sabari Nathan, Zhangyu Ye, Xiwen Lu, Yaqi Wu, Jiangxin Yang, Yanlong Cao, Siliang Tang, Yanpeng Cao, Matteo Maggioni, Ioannis Marras, Thomas Tanay, Gregory Slabaugh, Youliang Yan, Myungjoo Kang, Han-Soo Choi, Kyungmin Song, Shusong Xu, Xiaomu Lu, Tingniao Wang, Chunxia Lei, Bin Liu, Rajat Gupta, Vineet Kumar
This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+.
In this paper, we propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency.
Nonetheless, content and resource adaptive model is more preferred, and it is encouraging to apply simpler and efficient networks to the easier regions with less details and the scenarios with restricted efficiency constraints.
Recent works have demonstrated that global covariance pooling (GCP) has the ability to improve performance of deep convolutional neural networks (CNNs) on visual classification task.
First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on.
Ranked #4 on Virtual Try-on on VITON (IS metric)
However, there are substantial differences in the various types of deep learning methods dealing with image denoising.
By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity.
Ranked #691 on Image Classification on ImageNet
In this paper, we propose a novel perspective-guided convolution (PGC) for convolutional neural network (CNN) based crowd counting (i. e. PGCNet), which aims to overcome the dramatic intra-scene scale variations of people due to the perspective effect.
Most convolutional network (CNN)-based inpainting methods adopt standard convolution to indistinguishably treat valid pixels and holes, making them limited in handling irregular holes and more likely to generate inpainting results with color discrepancy and blurriness.
Ranked #2 on Image Inpainting on Paris StreetView
In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution.
To connect MAP and deep models, we in this paper present two generative networks for respectively modeling the deep priors of clean image and blur kernel, and propose an unconstrained neural optimization solution to blind deconvolution.
Specifically, MWCNN for image restoration is based on U-Net architecture, and inverse wavelet transform (IWT) is deployed to reconstruct the high resolution (HR) feature maps.
For the former, we directly apply a CCN to the binarized representation of an image to compute the Bernoulli distribution of each code for entropy estimation.
For guiding our IS to obtain better object performance, we explore issues of instance imbalance and class importance in datasets, which frequently occur and bring adverse effect on detection performance.
Therefore, we in this paper investigate the feasibility to remove cosine window from CF trackers with spatial regularization.
To overcome the limitation of separate optical flow estimation, we propose a Spatio-Temporal Filter Adaptive Network (STFAN) for the alignment and deblurring in a unified framework.
Ranked #3 on Deblurring on DVD (using extra training data)
Arbitrary attribute editing generally can be tackled by incorporating encoder-decoder and generative adversarial networks.
Convolutional neural networks have been proven effective in a variety of image restoration tasks.
The proposed methods are highly modular, readily plugged into existing deep CNNs.
Ranked #1 on Image Classification on iNaturalist (Top 3 Error metric)
Nowadays stereo cameras are more commonly adopted in emerging devices such as dual-lens smartphones and unmanned aerial vehicles.
In this paper, we propose an Iterative Kernel Correction (IKC) method for blur kernel estimation in blind SR problem, where the blur kernels are unknown.
Ranked #2 on Blind Super-Resolution on Set5 - 3x upscaling
In this paper, we propose a principled formulation and framework by extending bicubic degradation based deep SISR with the help of plug-and-play framework to handle LR images with arbitrary blur kernels.
The merits of the proposed MCTL are four-fold: 1) the concept of manifold criterion (MC) is first proposed as a measure validating the distribution matching across domains, and domain adaptation is achieved if the MC is satisfied; 2) the proposed MC can well guide the generation of the intermediate domain sharing similar distribution with the target domain, by minimizing the local domain discrepancy; 3) a global generative discrepancy metric (GGDM) is presented, such that both the global and local discrepancy can be effectively and positively reduced; 4) a simplified version of MCTL called MCTL-S is presented under a perfect domain generation assumption for more generic learning scenario.
In this work, we propose an Extreme Channel Prior embedded Network (ECPeNet) to plug the extreme channel priors (i. e., priors on dark and bright channels) into a network architecture for effective dynamic scene deblurring.
To handle this issue, this paper provides a better and simpler baseline deraining network by considering network architecture, input and output, and loss functions.
Ranked #1 on Single Image Deraining on Rain1400
In this paper, we propose a model-based deep learning approach for merging an HrMS and LrHS images to generate a high-resolution hyperspectral (HrHS) image.
As for missing pixels on both of half-faces, we present a generative reconstruction subnet together with a perceptual symmetry loss to enforce symmetry consistency of recovered structures.
Ranked #1 on Facial Inpainting on VggFace2
In this paper, we present a deep convolutional neural network to capture the inherent properties of image degradation, which can handle different kernels and saturated pixels in a unified framework.
However, both GAP and existing HOP methods assume unimodal distributions, which cannot fully capture statistics of convolutional activations, limiting representation ability of deep CNNs, especially for samples with complex contents.
Different from single view case, MSL should take both common and specific knowledge among different views into consideration.
Video summarization is a challenging under-constrained problem because the underlying summary of a single video strongly depends on users' subjective understandings.
To overcome this problem, in this paper, an intra prediction convolutional neural network (IPCNN) is proposed for intra prediction, which exploits the rich context of the current block and therefore is capable of improving the accuracy of predicting the current block.
To address this issue, we propose an unsupervised DL method for LdCT enhancement that incorporates unlabeled LdCT sinograms directly into the network training.
In existing off-line deep learning models for CF trackers, the model adaptation usually is either abandoned or has closed-form solution to make it feasible to learn deep representation in an end-to-end manner.
However, the negative entries in the coefficient matrix are forced to be positive when constructing the affinity matrix via exponentiation, absolute symmetrization, or squaring operations.
Different from the state-of-the-art face inpainting methods that have no control over the synthesized content and can only handle frontal face pose, our approach can faithfully recover the missing content under various head poses while preserving the identity.
While deep convolutional neural networks (CNNs) have achieved impressive success in image denoising with additive white Gaussian noise (AWGN), their performance remains limited on real-world noisy photographs.
Ranked #3 on Denoising on Darmstadt Noise Dataset
Without the need of annotating bounding boxes, the existing methods usually follow a two/multi-stage pipeline with an online compulsive stage to extract object proposals, which is an order of magnitude slower than fast fully supervised object detectors such as SSD  and YOLO .