We show this pre-training strategy leads to a flexible, simple, and efficient framework with improved transfer results to downstream tasks.
Ranked #1 on Semantic Segmentation on Hypersim
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.
We propose a method for estimating an athlete's global 3D position and articulated pose using multiple cameras.