We utilize variational autoencoder (VAE) to estimate the lower bound of the log-likelihood of image and explore to project the input images back into the high probability regions of image manifold again.
Despite recent improvement of supervised monocular depth estimation, the lack of high quality pixel-wise ground truth annotations has become a major hurdle for further progress.
For the inter-domain learning, a consistency constraint is applied to the LAs modelled by two dual-modelling networks to exploit the complementary knowledge among different data domains.
In the context of lossy compression, Blau & Michaeli (2019) adopt a mathematical notion of perceptual quality and define the information rate-distortion-perception function, generalizing the classical rate-distortion tradeoff.
In this paper, we propose an inter-cascade generative adversarial network, namely JAS-GAN, to segment the unbalanced atrial targets from LGE CMR images automatically and accurately in an end-to-end way.
The key feature of our model is its ability to aggregate three different-level features (local context, scene, and dataset-level) to compositionally predict the visual relationship.
1 code implementation • 21 Apr 2021 • Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng Li, Thomas Tanay, Fenglong Song, Wentao Chao, Qiang Guo, Yan Liu, Jiang Li, Xiaochao Qu, Dewang Hou, Jiayu Yang, Lyn Jiang, Di You, Zhenyu Zhang, Chong Mou, Iaroslav Koshelev, Pavel Ostyakov, Andrey Somov, Jia Hao, Xueyi Zou, Shijie Zhao, Xiaopeng Sun, Yiting Liao, Yuanzhi Zhang, Qing Wang, Gen Zhan, Mengxi Guo, Junlin Li, Ming Lu, Zhan Ma, Pablo Navarrete Michelini, Hai Wang, Yiyun Chen, Jingyu Guo, Liliang Zhang, Wenming Yang, Sijung Kim, Syehoon Oh, Yucong Wang, Minjie Cai, Wei Hao, Kangdi Shi, Liangyan Li, Jun Chen, Wei Gao, Wang Liu, XiaoYu Zhang, Linjie Zhou, Sixin Lin, Ru Wang
This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results.
Recently, there has been rapid and significant progress on image dehazing.
The 8-dimensional optimal hyper-parameter combination should be such that the network dynamics simulate the resting state alpha rhythm (8 - 13 Hz rhythms in brain signals).
We develop a new physical model for the rain effect and show that the well-known atmosphere scattering model (ASM) for the haze effect naturally emerges as its homogeneous continuous limit.
We propose an enhanced multi-scale network, dubbed GridDehazeNet+, for single image dehazing.
To defend against manipulation of image content, such as splicing, copy-move, and removal, we develop a Progressive Spatio-Channel Correlation Network (PSCC-Net) to detect and localize image manipulations.
Recent years have seen considerable research activities devoted to video enhancement that simultaneously increases temporal frame rate and spatial resolution.
To the best of our knowledge, this is the first work that improves data efficiency of image captioning by utilizing LM pretrained on unimodal data.
A null point is located above the flux rope.
Solar and Stellar Astrophysics Space Physics
Therefore, it is capable of consolidating the expressibility of different architectures, resulting in a more accurate indirect domain shift (IDS) from the hazy images to that of clear images.
In this paper, we present edge-featured graph attention networks, namely EGATs, to extend the use of graph neural networks to those tasks learning on graphs with both node and edge features.
Quantized Neural Networks (QNNs) have achieved an enormous step in improving computational efficiency, making it possible to deploy large models to mobile and miniaturized devices.
1 code implementation • 10 Nov 2020 • Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, WangMeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li, Byung-Hoon Kim, Joonyoung Song, Jong Chul Ye, JaeHyun Baek, Magauiya Zhussip, Yeskendir Koishekenov, Hwechul Cho Ye, Xin Liu, Xueying Hu, Jun Jiang, Jinwei Gu, Kai Li, Pengliang Tan, Bingxin Hou
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results.
Audio-guided face reenactment aims to generate a photorealistic face that has matched facial expression with the input audio.
Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data.
Extensive experiments demonstrate that owing to the informativeness of the camera raw data, the effectiveness of the network architecture, and the separation of super-resolution and color correction processes, the proposed method achieves superior VSR results compared to the state-of-the-art and can be adapted to any specific camera-ISP.
Video frame interpolation aims at synthesizing intermediate frames from nearby source frames while maintaining spatial and temporal consistencies.
In this paper, we introduce a novel network that utilizes the attention mechanism and wavelet transform, dubbed AWNet, to tackle this learnable image ISP problem.
The automatic text-based diagnosis remains a challenging task for clinical use because it requires appropriate balance between accuracy and interpretability.
no code implementations • 6 May 2020 • Shanxin Yuan, Radu Timofte, Ales Leonardis, Gregory Slabaugh, Xiaotong Luo, Jiangtao Zhang, Yanyun Qu, Ming Hong, Yuan Xie, Cuihua Li, Dejia Xu, Yihao Chu, Qingyan Sun, Shuai Liu, Ziyao Zong, Nan Nan, Chenghua Li, Sangmin Kim, Hyungjoon Nam, Jisu Kim, Jechang Jeong, Manri Cheon, Sung-Jun Yoon, Byungyeon Kang, Junwoo Lee, Bolun Zheng, Xiaohong Liu, Linhui Dai, Jun Chen, Xi Cheng, Zhen-Yong Fu, Jian Yang, Chul Lee, An Gia Vien, Hyunkook Park, Sabari Nathan, M. Parisa Beham, S Mohamed Mansoor Roomi, Florian Lemarchand, Maxime Pelcat, Erwan Nogues, Densen Puthussery, Hrishikesh P. S, Jiji C. V, Ashish Sinha, Xuan Zhao
Track 1 targeted the single image demoireing problem, which seeks to remove moire patterns from a single image.
In natural language processing, relation extraction seeks to rationally understand unstructured text.
Ranked #10 on Relation Extraction on TACRED
Furthermore, we also design a shift vector processing element (SVPE) array to replace all 16-bit multiplications with SHIFT operations in convolution operation on FPGAs.
We use these benchmarks to study the performance of several state-of-the-art long-tail models on the LTVRR setup.
Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1-3 bits weights and activations.
no code implementations • 2 Feb 2020 • Guang Yang, Jun Chen, Zhifan Gao, Shuo Li, Hao Ni, Elsa Angelini, Tom Wong, Raad Mohiaddin, Eva Nyktari, Ricardo Wage, Lei Xu, Yanping Zhang, Xiuquan Du, Heye Zhang, David Firmin, Jennifer Keegan
Using our MVTT recursive attention model, both the LA anatomy and scar can be segmented accurately (mean Dice score of 93% for the LA anatomy and 87% for the scar segmentations) and efficiently (~0. 27 seconds to simultaneously segment the LA anatomy and scars directly from the 3D LGE CMR dataset with 60-68 2D slices).
Video super-resolution aims at generating a high-resolution video from its low-resolution counterpart.
Metal nanoclusters (NCs), typically consisting of a few to tens of metal atoms, bridge the gap between organometallic compounds and crystalline metal nanoparticles.
The proposed hazing method does not rely on the atmosphere scattering model, and we provide an explanation as to why it is not necessarily beneficial to take advantage of the dimension reduction offered by the atmosphere scattering model for image dehazing, even if only the dehazing results on synthetic images are concerned.
Ranked #6 on Image Relighting on VIDIT’20 validation set
Based on the generated discriminative consistent domain, we can use the unlabeled data to learn the task model along with the labeled data via a consistent image generation.
In this paper, we report on a parallel freeviewpoint video synthesis algorithm that can efficiently reconstruct a high-quality 3D scene representation of sports scenes.
Haze and smog are among the most common environmental factors impacting image quality and, therefore, image analysis.
Ranked #2 on Image Dehazing on SOTS Outdoor
To better exploit three-dimensional (3D) characteristics, multi-view dynamic images are proposed.
Late Gadolinium Enhanced Cardiac MRI (LGE-CMRI) for detecting atrial scars in atrial fibrillation (AF) patients has recently emerged as a promising technique to stratify patients, guide ablation therapy and predict treatment success.
This paper proposes a new parallel approach to solve connected components on a 2D binary image implemented with CUDA.
Motivated by the important archaeological application of exploring cultural heritage objects, in this paper we study the challenging problem of automatically segmenting curve structures that are very weakly stamped or carved on an object surface in the form of a highly noisy depth map.
In this paper, we report an optimized union-find (UF) algorithm that can label the connected components on a 2D image efficiently by employing the GPU architecture.
A simulation method based on the liquid crystal response and the human visual system is suitable to characterize motion blur for LCDs but not other display types.