Bayesian optimization provides a powerful framework for global optimization of black-box, expensive-to-evaluate functions.
Full-reference image quality metrics (FR-IQMs) aim to measure the visual differences between a pair of reference and distorted images, with the goal of accurately predicting human judgments.
Neural fields are evolving towards a general-purpose continuous representation for visual computing.
In this work, we propose a unified lightweight CNN network that features a large effective receptive field (ERF) and demonstrates comparable or even better performance than Transformers while bearing less computational costs.
Most in-the-wild images are stored in Low Dynamic Range (LDR) form, serving as a partial observation of the High Dynamic Range (HDR) visual world.
Video frame interpolation (VFI) enables many important applications that might involve the temporal domain, such as slow motion playback, or the spatial domain, such as stop motion sequences.
With a hybrid architecture that separates static and dynamic contents, fluid interactions with static obstacles are reconstructed for the first time without additional geometry input or human labeling.
We tackle the problem of generating novel-view images from collections of 2D images showing refractive and reflective objects.
High Dynamic Range (HDR) content is becoming ubiquitous due to the rapid development of capture technologies.
Outdoor scene relighting is a challenging problem that requires good understanding of the scene geometry, illumination and albedo.
Even holding a mobile phone camera in the front of the face while sitting for a long duration is not convenient.
This paper introduces the first differentiable simulator of event streams, i. e., streams of asynchronous brightness change signals recorded by event cameras.
1 code implementation • 13 Mar 2021 • Mallikarjun B R, Ayush Tewari, Abdallah Dib, Tim Weyrich, Bernd Bickel, Hans-Peter Seidel, Hanspeter Pfister, Wojciech Matusik, Louis Chevallier, Mohamed Elgharib, Christian Theobalt
We present an approach for high-quality intuitive editing of the camera viewpoint and scene illumination in a portrait image.
We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input.
Regrettably, capturing DISTORTED sensor readings is time-consuming; as well, there is a lack of CLEAN HDR videos.
We address these limitations for the first time in the literature and present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations, for several types of loose garments.
Due to the different data modality of event cameras compared to classical cameras, existing methods cannot be directly applied to and re-trained for event streams.
Our approach has the following favorable properties: (i) It is the first full head morphable model that includes hair.
Our network design and loss functions ensure a disentangled parameterization of not only identity and albedo, but also, for the first time, an expression basis.
We suggest to represent an X-Field -a set of 2D images taken across different view, time or illumination conditions, i. e., video, light field, reflectance fields or combinations thereof-by learning a neural network (NN) to map their view, time or light coordinates to 2D images.
We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image.
The reflectance field of a face describes the reflectance properties responsible for complex lighting effects including diffuse, specular, inter-reflection and self shadowing.
We introduce a new benchmark dataset for face video forgery detection, of unprecedented quality.
StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination.
We suggest representing light field (LF) videos as "one-off" neural networks (NN), i. e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views.
We present a style-preserving visual dubbing approach from single video inputs, which maintains the signature style of target actors when modifying facial expressions, including mouth motions, to match foreign languages.
4 code implementations • 1 Jul 2019 • Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, Christian Theobalt
The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy.
Ranked #6 on 3D Multi-Person Pose Estimation on MuPoTS-3D
Our lightweight setup allows operations in uncontrolled environments, and lends itself to telepresence applications such as video-conferencing from dynamic environments.
In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces.
We tackle these challenges based on a novel lightweight setup that converts a standard baseball cap to a device for high-quality pose estimation based on a single cap-mounted fisheye camera.
We present the first end to end approach for real time material estimation for general object shapes with uniform material that only requires a single color image as input.
Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem.
A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton.
Ranked #16 on Pose Estimation on Leeds Sports Poses
We find that the existing image quality metrics provide good measures of light-field quality, but require dense reference light- fields for optimal performance.
Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center.
We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup.
We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual reality headset.
Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy.
In computer vision, convolutional neural networks (CNNs) have recently achieved new levels of performance for several inverse problems where RGB pixel appearance is mapped to attributes such as positions, normals or reflectance.
In this paper, we propose a new approach that tracks the full skeleton motion of the hand from multiple RGB cameras in real-time.
Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images.
In this paper, we introduce a new approach to partial, intrinsic isometric matching.
We investigate the problem of identifying the position of a viewer inside a room of planar mirrors with unknown geometry in conjunction with the room's shape parameters.
We present a full-reference video quality metric geared specifically towards the requirements of Computer Graphics applications as a faster computational alternative to subjective evaluation.
Current quality assessment metrics are not suitable for this task, as they assume that both reference and test images have the same dynamic range.