Object detection and forecasting are fundamental components of embodied perception.
In this work, we present a new method for 3D face reconstruction from multi-view RGB images.
Efficient processing of high-res video streams is safety-critical for many robotics applications such as autonomous driving.
To facilitate the application to gradient-based algorithms, we also propose a differentiable representation for the neighborhood of architectures.
While past work has studied the algorithmic trade-off between latency and accuracy, there has not been a clear metric to compare different methods along the Pareto optimal latency-accuracy curve.
Ranked #2 on Real-Time Object Detection on Argoverse-HD (Detection-Only, Val) (using extra training data)
We also revisit existing approaches for fast convergence and show that budget-aware learning schedules readily outperform such approaches under (the practical but under-explored) budgeted training setting.
Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision.
While most prior work treats this as a regression problem, we instead formulate it as a discrete $K$-way classification task, where a classifier is trained to return one of $K$ discrete alignments.
In this work, we propose a method to overcome this limitation through exploiting the properties of the joint problem of training time inference and learning.
Specifically, we show that general energy minimization, even in the 2-label pairwise case, and planar energy minimization with three or more labels are exp-APX-complete.