This paper considers to jointly tackle the highly correlated tasks of estimating 3D human body poses and predicting future 3D motions from RGB image sequences.
Action2motion stochastically generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos.
Event camera is an emerging imaging sensor for capturing dynamics of moving objects as events, which motivates our work in estimating 3D human pose and shape from the event signals.
In this paper, we propose a novel learning-based framework that combines the robustness of the parametric model with the flexibility of free-form 3D deformation.
Given a single chair image, could we extract its 3D shape and animate its plausible articulations and motions?
However, Chamfer distance is quite sensitive to noise and outliers, thus could be unreliable to assign correspondences.
Finally, we verify the proposed framework on the public KITTI dataset with different 3D object detectors.
Action recognition is a relatively established task, where givenan input sequence of human motion, the goal is to predict its ac-tion category.
In this paper, we propose a novel approach to convert given speech audio to a photo-realistic speaking video of a specific person, where the output video has synchronized, realistic, and expressive rich body dynamics.
Inspired by the recent advances in human shape estimation from single color images, in this paper, we attempt at estimating human body shapes by leveraging the geometric cues from single polarization images.
First, based on a generative human template, for every two frames having sufficient overlap, an initial pairwise alignment is performed; It is followed by a global non-rigid registration procedure, in which partial results from RGBD frames are collected into a unified 3D shape, under the guidance of correspondences from the pairwise alignment; Finally, the texture map of the reconstructed human model is optimized to deliver a clear and spatially consistent texture.
Polarization images are known to be able to capture polarized reflected lights that preserve rich geometric cues of an object, which has motivated its recent applications in reconstructing detailed surface normal of the objects of interest.
In this paper we present a novel approach for depth map enhancement from an RGB-D video sequence.
We discovered that these internal contours, which are results of convex parts on an object's surface, can lead to a tighter fit than the original visual hull.