Existing state-of-the-art crowd counting algorithms rely excessively on location-level annotations, which are burdensome to acquire.
This paper considers to jointly tackle the highly correlated tasks of estimating 3D human body poses and predicting future 3D motions from RGB image sequences.
Action2motion stochastically generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos.
Event camera is an emerging imaging sensor for capturing dynamics of moving objects as events, which motivates our work in estimating 3D human pose and shape from the event signals.
Given a single chair image, could we extract its 3D shape and animate its plausible articulations and motions?
However, Chamfer distance is quite sensitive to noise and outliers, thus could be unreliable to assign correspondences.
To battle the ingrained issue of accuracy degradation, we propose a novel and powerful network called Scale Tree Network (STNet) for accurate crowd counting.
Using a learning-based approach, a trained network can learn and encode solution patterns to guide the solution of new problem instances instead of executing an expensive online search.
Action recognition is a relatively established task, where givenan input sequence of human motion, the goal is to predict its ac-tion category.
Inspired by the recent advances in human shape estimation from single color images, in this paper, we attempt at estimating human body shapes by leveraging the geometric cues from single polarization images.
First, based on a generative human template, for every two frames having sufficient overlap, an initial pairwise alignment is performed; It is followed by a global non-rigid registration procedure, in which partial results from RGBD frames are collected into a unified 3D shape, under the guidance of correspondences from the pairwise alignment; Finally, the texture map of the reconstructed human model is optimized to deliver a clear and spatially consistent texture.
Crowd counting is an important vision task, which faces challenges on continuous scale variation within a given scene and huge density shift both within and across images.
Polarization images are known to be able to capture polarized reflected lights that preserve rich geometric cues of an object, which has motivated its recent applications in reconstructing detailed surface normal of the objects of interest.
This paper presents the first approach for simultaneously recovering the 3D shape of both the wavy water surface and the moving underwater scene.
Numerous techniques have been proposed for reconstructing 3D models for opaque objects in past decades.
The key feature of BSD-GAN is that it is trained in multiple branches, progressively covering both the breadth and depth of the network, as resolutions of the training images increase to reveal finer-scale features.
Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN.
Ranked #2 on Image-to-Image Translation on Aerial-to-Map
Estimating the shape of transparent and refractive objects is one of the few open problems in 3D reconstruction.
Extracting environment mattes using existing approaches often requires either thousands of captured images or a long processing time, or both.
In underwater imagery, the image formation process includes refractions that occur when light passes from water into the camera housing, typically through a flat glass port.