To facilitate our investigation for robustness and address limitations of previous works, we collect InSpaceType, a high-quality and high-resolution RGBD dataset for general indoor environments.
We instead propose fine-grained task that treats each RGB-D pair as a task in our meta-optimization.
Ranked #30 on Monocular Depth Estimation on NYU-Depth V2
This work digs into a root question in human perception: can face geometry be gleaned from one's voices?
Ranked #1 on 3D Face Modelling on Voxceleb-3D
The majority of prior monocular depth estimation methods without groundtruth depth guidance focus on driving scenarios.
Ranked #1 on Monocular Depth Estimation on VA (Virtual Apartment)
Our synergy process leverages a representation cycle for 3DMM parameters and 3D landmarks.
Ranked #1 on Face Alignment on AFLW
This work focuses on the analysis that whether 3D face models can be learned from only the speech inputs of speakers.
This work focuses on complete 3D facial geometry prediction, including 3D facial alignment via 3D face modeling and face orientation estimation using the proposed multi-task, multi-modal, and multi-representation landmark refinement network (M$^3$-LRN).
Mask regression is based on 2D, 2. 5D, and 3D ROI using the pseudo-lidar and image-based representations.
Ranked #16 on Instance Segmentation on Cityscapes val (using extra training data)
Recent sparse depth completion for lidars only focuses on the lower scenes and produces irregular estimations on the upper because existing datasets, such as KITTI, do not provide groundtruth for upper areas.
Compared with popular sampling methods such as Farthest Point Sampling (FPS) and Ball Query, CAGQ achieves up to 50X speed-up.
Such a transformation enables CFCNet to predict features and reconstruct data of missing depth measurements according to their corresponding, transformed RGB features.
We first propose a novel nonconvex rank surrogate on the general rank minimization problem and apply this to the corrupted image completion problem.
Also, we propose to create building masks from semantic segmentation using an encoder-decoder network.
In this paper, a very effective method to solve the contiguous face occlusion recognition problem is proposed.