Video DeepFakes are fake media created with Deep Learning (DL) that manipulate a person's expression or identity.
We propose Mesh Neural Cellular Automata (MeshNCA), a method for directly synthesizing dynamic textures on 3D meshes without requiring any UV maps.
This enables us to generate images with more varied brightness, and images that better match a desired style or color.
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains.
In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives.
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
In this paper, we introduce an SSL strategy that leverages positive pairs in both the spatial and temporal domain.
We propose NEMTO, the first end-to-end neural rendering pipeline to model 3D transparent objects with complex geometry and unknown indices of refraction.
Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information, such as scene context, semantic relationships, gaze direction, and object dissimilarity.
Not adapting this initial latent tensor to the style makes fine-tuning slow, expensive, and impractical, especially when only a few target style images are available.
Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information such as scene context, semantic relationships, gaze direction, and object dissimilarity.
Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar, without ground-truth input-translation pairs.
The success of the Neural Radiance Fields (NeRF) in novel view synthesis has inspired researchers to propose neural implicit scene reconstruction.
To fill this gap, we derive a novel, cumulant-based, approach for Poisson-Gaussian noise modeling from paired image samples.
While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet.
At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks.
In essence, this strategy ignores the fact that two crops may truly contain different image information, e. g., background and small objects, and thus tends to restrain the diversity of the learned representations.
Finding accurate correspondences among different views is the Achilles' heel of unsupervised Multi-View Stereo (MVS).
In this framework, we exploit the outputs of a deep denoising network alongside an image convolved with a reliable filter.
This lets us show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances.
Generative Adversarial Network (GAN) based localized image editing can suffer from ambiguity between semantic attributes.
Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy.
Ranked #1 on Depth Estimation on eBDtheque
Although adversarial training and its variants currently constitute the most effective way to achieve robustness against adversarial attacks, their poor generalization limits their performance on the test samples.
Furthermore, as proof of concept, we show that when using our oracle fidelity map we even outperform the fully retrained methods, whether trained on noisy or restored images.
Saliency prediction has made great strides over the past two decades, with current techniques modeling low-level information, such as color, intensity and size contrasts, and high-level ones, such as attention and gaze direction for entire objects.
While novel denoising networks were designed for real images coming from different distributions, or for specific applications, comparatively small improvement was achieved on Gaussian denoising.
Our method, though partly reliant on the quality of the generative network inversion, is competitive with state-of-the-art supervised and task-specific restoration methods.
2 code implementations • 27 Sep 2020 • Majed El Helou, Ruofan Zhou, Sabine Süsstrunk, Radu Timofte, Mahmoud Afifi, Michael S. Brown, Kele Xu, Hengxing Cai, Yuzhong Liu, Li-Wen Wang, Zhi-Song Liu, Chu-Tak Li, Sourya Dipta Das, Nisarg A. Shah, Akashdeep Jassal, Tongtong Zhao, Shanshan Zhao, Sabari Nathan, M. Parisa Beham, R. Suganya, Qing Wang, Zhongyun Hu, Xin Huang, Yaning Li, Maitreya Suin, Kuldeep Purohit, A. N. Rajagopalan, Densen Puthussery, Hrishikesh P. S, Melvin Kuriakose, Jiji C. V, Yu Zhu, Liping Dong, Zhuolong Jiang, Chenghua Li, Cong Leng, Jian Cheng
The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illuminant orientation (i. e., light source position).
We design our VTN as an encoder-decoder network, with modules dedicated to letting the information flow across the feature channels, to account for the dependencies between the semantic parts.
We analyze the influence of adversarial training on the loss landscape of machine learning models.
Deep image relighting is gaining more interest lately, as it allows photo enhancement through illumination-specific retouching without human effort.
Focusing on StyleGAN, we introduce a simple and effective method for making local, semantically-aware edits to a target output image.
Extreme image or video completion, where, for instance, we only retain 1% of pixels in random locations, allows for very cheap sampling in terms of the required pre-processing.
In this paper, we corroborate based on three subjective experiments on a novel image dataset that objects in natural images are inherently perceived to have varying levels of importance.
To study JDSR on microscopy data, we propose such a novel JDSR dataset, Widefield2SIM (W2S), acquired using a conventional fluorescence widefield and SIM imaging.
We propose to regularize this representation in the last feature layer before classification layers.
Plug-and-play denoisers can be used to perform generic image restoration tasks independent of the degradation type.
Blind and universal image denoising consists of using a unique model that denoises images with any level of noise.
Ranked #1 on Grayscale Image Denoising on BSD68 sigma40
In this paper, we incorporate knowledge of the shadow's physical properties, in the form of shadow detection masks, into a correlation-based tracking algorithm.
We propose a new notion of `non-linearity' of a network layer with respect to an input batch that is based on its proximity to a linear system, which is reflected in the non-negative rank of the activation matrix.
Image optimization problems encompass many applications such as spectral fusion, deblurring, deconvolution, dehazing, matting, reflection removal and image interpolation, among others.
We propose Deep Feature Factorization (DFF), a method capable of localizing similar semantic concepts within an image or a set of images.
Ranked #5 on Unsupervised Human Pose Estimation on Tai-Chi-HD
Removing reflection artefacts from a single image is a problem of both theoretical and practical interest, which still presents challenges because of the massively ill-posed nature of the problem.
By training on high-quality samples, our deep residual demosaicing and super-resolution network is able to recover high-quality super-resolved images from low-resolution Bayer mosaics in a single step without producing the artifacts common to such processing when the two operations are done separately.
Since information is a natural way of measuring image complexity, our proposed algorithm leads to image segments that are smaller and denser in areas of high complexity and larger in homogeneous regions, thus simplifying the image while preserving its details.
This paper introduces a novel approach to data analysis designed for the needs of specialists in psychology of religion.
Based on a state-of-the-art segmentation framework and a novel manually segmented image database (both indoor and outdoor scenes) that contain 4-channel images (RGB+NIR), we study how to best incorporate the specific characteristics of the NIR response.