In this paper, we present a novel learning framework, ActiveNeRF, aiming to model a 3D scene with a constrained input budget.
Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers.
The paper presents a scalable approach for learning spatially distributed visual representations over individual tokens and a holistic instance representation simultaneously.
Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain, where only unlabeled samples are given.
Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy.
Relying on temporal continuity in videos, our work assumes that the 3D scene structure in nearby video frames remains static.
Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods.
Ranked #4 on Unsupervised Video Object Segmentation on DAVIS 2017 (val) (using extra training data)
Fourth, in order to shed light on the potential of self-supervised learning on the task of video correspondence flow, we probe the upper bound by training on additional data, \ie more diverse videos, further demonstrating significant improvements on video segmentation.
Many applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints.
Ranked #1 on Stereo Depth Estimation on KITTI2012
In this framework, real images are first converted to a synthetic domain representation that reduces complexity arising from lighting and texture.
We propose imitation refinement, a novel approach to refine imperfect input patterns, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined patterns imitate the ideal data.