Accurate and generalizable absolute 3D human pose estimation from monocular 2D pose input is an ill-posed problem.
Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)
In this paper, we propose a novel proposal-based learnable framework, which models MOT as a proposal generation, proposal scoring and trajectory inference paradigm on an affinity graph.
Specifically, the model encodes multiple agents' past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context.
This paper proposes a novel memory-based online video representation that is efficient, accurate and predictive.
In this work, we propose a new framework to learn compact and fast ob- ject detection networks with improved accuracy using knowledge distillation  and hint learning .
In this work, we demonstrate that it is possible to learn features for network-flow-based data association via backpropagation, by expressing the optimum of a smoothed network flow problem as a differentiable function of the pairwise association costs.
DESIRE effectively predicts future locations of objects in multiple scenes by 1) accounting for the multi-modal nature of the future prediction (i. e., given the same context, future may vary), 2) foreseeing the potential future outcomes and make a strategic prediction based on that, and 3) reasoning not only from the past motion history, but also from the scene context as well as the interactions among the agents.
Ranked #1 on Trajectory Prediction on PAID
In this paper, we investigate two new strategies to detect objects accurately and efficiently using deep convolutional neural network: 1) scale-dependent pooling and 2) layer-wise cascaded rejection classifiers.
In CNN-based object detection methods, region proposal becomes a bottleneck when objects exhibit significant scale variation, occlusion or truncation.
Ranked #4 on Vehicle Pose Estimation on KITTI Cars Hard
Despite the great progress achieved in recognizing objects as 2D bounding boxes in images, it is still very challenging to detect occluded objects and estimate the 3D properties of multiple objects from a single image.
In this paper, we focus on the two key aspects of multiple target tracking problem: 1) designing an accurate affinity measure to associate detections and 2) implementing an efficient and accurate (near) online multiple target tracking algorithm.
Ranked #17 on Multiple Object Tracking on KITTI Tracking test
Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification.
Ranked #7 on Room Layout Estimation on SUN RGB-D