Conventional stereo suffers from a fundamental trade-off between imaging volume and signal-to-noise ratio (SNR) -- due to the conflicting impact of aperture size on both these variables.
End-to-end training is made possible by differentiable registration and 3D triangulation modules.
Therefore, we firstly propose (1) a large-scale dataset, InterHand2. 6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image.
The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p' in a neighboring view, and then combine the features at p' with the features at p, thus leading to a 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p'.
Ranked #2 on 3D Human Pose Estimation on Human3.6M (using extra training data)
In this paper, we propose a self-supervised domain adaptation approach to enable the animation of high-fidelity face models from a commodity camera.
In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video.
Ranked #1 on Facial Landmark Detection on 300-VW (C)
By prioritizing stereo matching on a subset of patches that are highly reconstructable and also cover the 3D surface, we are able to accelerate MVS with minimal reduction in accuracy and completeness.
no code implementations • 17 Jun 2016 • Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang
The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search.
The tracker is formulated as a quadratic optimization problem with L0 norm constraints, which we propose to solve with the solution path algorithm.
Therefore, our tracker propagates identity information to frames without recognized faces by uncovering the appearance and spatial manifold formed by person detections.
We propose two well-motivated ranking-based methods to enhance the performance of current state-of-the-art human activity recognition systems.
We approach this problem by first showing that local handcrafted features and Convolutional Neural Networks (CNNs) share the same convolution-pooling network structure.
First, we propose a two-stream Stacked Convolutional Independent Subspace Analysis (ConvISA) architecture to show that unsupervised learning methods can significantly boost the performance of traditional local features extracted from data-independent models.
Self-paced learning (SPL) is a recently proposed learning regime inspired by the learning process of humans and animals that gradually incorporates easy to more complex samples into training.
A device just like Harry Potter's Marauder's Map, which pinpoints the location of each person-of-interest at all times, provides invaluable information for analysis of surveillance videos.