We propose to address these issues in a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
In this paper we contribute a simple yet effective approach for estimating 3D poses of multiple people from multi-view images.
We then train a novel network that concatenates the camera calibration to the image features and uses these together to regress 3D body shape and pose.
Ranked #1 on 3D Human Pose Estimation on AGORA
We present Hand ArticuLated Occupancy (HALO), a novel representation of articulated hands that bridges the advantages of 3D keypoints and neural implicit surfaces and can be used in end-to-end trainable architectures.
In this paper we demonstrate that self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands and their parts, is a major cause of the final 3D pose error.
Hand pose estimation is difficult due to different environmental conditions, object- and self-occlusion as well as diversity in hand shape and appearance.
Encouraged by the success of contrastive learning on image classification tasks, we propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
Despite significant progress, we show that state of the art 3D human pose and shape estimation methods remain sensitive to partial occlusion and can produce dramatically wrong predictions although much of the body is observable.
Ranked #2 on 3D Human Pose Estimation on AGORA
In this paper, we propose VariTex - to the best of our knowledge the first method that learns a variational latent feature space of neural face textures, which allows sampling of novel identities.
However, this is problematic since the backward warp field is pose dependent and thus requires large amounts of data to learn.
In this paper we address the challenge of exploration in deep reinforcement learning for robotic manipulation tasks.
We followed all six tenets to create a new robotic platform, HuggieBot 2. 0, that has a soft, warm, inflated body (HuggieChest) and uses visual and haptic sensing to deliver closed-loop hugging.
Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object in order to improve the estimation accuracy during alignment.
To this end, we present a method to estimate SMPL parameters from 6-12 EM sensors.
Furthermore, we show that in the presence of limited amounts of real-world training data, our method allows for improvements in the downstream task of semi-supervised cross-dataset gaze estimation.
At the heart of our approach lies the idea to cast motion infilling as an inpainting problem and to train a convolutional de-noising autoencoder on image-like representations of motion sequences.
Many object pose estimation algorithms rely on the analysis-by-synthesis framework which requires explicit representations of individual object instances.
We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles.
We demonstrate qualitatively and quantitatively that our proposed approach is able to model the appearance of individual strokes, as well as the compositional structure of larger diagram drawings.
In this paper, we propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion.
Estimating 3D hand pose from 2D images is a difficult, inverse problem due to the inherent scale and depth ambiguities.
We present HiDe, a novel hierarchical reinforcement learning architecture that successfully solves long horizon control tasks and generalizes to unseen test scenarios.
Accurately labeled real-world training data can be scarce, and hence recent works adapt, modify or generate images to boost target datasets.
This is implemented via a hierarchy of small-sized neural networks connected analogously to the kinematic chains in the human body as well as a joint-wise decomposition in the loss function.
Hierarchical Reinforcement Learning (HRL) has held the promise to enhance the capabilities of RL agents via operation on different levels of temporal abstraction.
This enables an easy to implement learning algorithm that is robust to errors of the model used in the model predictive controller.
In this paper, we propose a method for training control policies for human-robot interactions such as handshakes or hand claps via Deep Reinforcement Learning.
Inter-personal anatomical differences limit the accuracy of person-independent gaze estimation networks.
Ranked #1 on Gaze Estimation on MPII Gaze (using extra training data)
In this work, we present a novel method to alleviate this problem by leveraging generative adversarial training to synthesize an eye image conditioned on a target gaze direction.
Convolutional architectures have recently been shown to be competitive on many sequence modelling tasks when compared to the de-facto standard of recurrent neural networks (RNNs), while providing computational and modeling advantages due to inherent parallelism.
This paper studies the task of full generative modelling of realistic images of humans, guided only by coarse sketch of the pose, while providing control over the specific instance or type of outfit worn by the user.
The new optimization problem can be viewed as a Conditional Random Field (CRF) in which the random variables are associated with the binary edge labels of the initial graph and the hard constraints are introduced in the CRF as high-order potentials.
To learn from sufficient data, we synthesize IMU data from motion capture datasets.
We propose to learn a better utility function that predicts the usefulness of future viewpoints.
Conventional feature-based and model-based gaze estimation methods have proven to perform well in settings with controlled illumination and specialized cameras.
Furthermore, we show that our proposed method can be used without changes on depth images and performs comparably to specialized methods.
Digital ink promises to combine the flexibility and aesthetics of handwriting and the ability to process, search and edit digital text.
In this paper we propose a new semi-supervised GAN architecture (ss-InfoGAN) for image synthesis that leverages information from few labels (as little as 0. 22%, max.
We introduce a new method that efficiently computes a set of viewpoints and trajectories for high-quality 3D reconstructions in outdoor environments.
Furthermore, we propose new evaluation protocols to assess the quality of synthetic motion sequences even for which no ground truth data exists.
Temporal information can provide additional cues about the location of body joints and help to alleviate these issues.
Ranked #2 on Pose Estimation on J-HMDB