We address the problem of learning self-supervised representations from unlabeled image collections.
When the number of potential labels is large, human annotators find it difficult to mention all applicable labels for each training image.
Recent self-supervised representation learning techniques have largely closed the gap between supervised and unsupervised learning on ImageNet classification.
We propose ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available.
In order to facilitate progress in this area we present two new natural world visual classification datasets, iNat2021 and NeWT.
Understanding the 3D world without supervision is currently a major challenge in computer vision as the annotations required to supervise deep networks for tasks in this domain are expensive to obtain on a large scale.
We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs.
We introduce an approach for updating older tree inventories with geographic coordinates using street-level panorama images and a global optimization framework for tree instance matching.
Appearance information alone is often not sufficient to accurately differentiate between fine-grained visual categories.
Per-pixel ground-truth depth data is challenging to acquire at scale.
Ranked #2 on Monocular Depth Estimation on Make3D
Our framework is both generic, allowing the design of teaching schedules for different memory models, and also interactive, allowing the teacher to adapt the schedule to the underlying forgetting mechanisms of the learner.
We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data.
We highlight that adaptivity does not speed up the teaching process when considering existing models of version space learners, such as "worst-case" (the learner picks the next hypothesis randomly from the version space) and "preference-based" (the learner picks hypothesis according to some global preference).
Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories.
Ranked #3 on Image Classification on iNaturalist
Learning based methods have shown very promising results for the task of depth estimation in single images.
Ranked #3 on Monocular Depth Estimation on Mid-Air Dataset
Under some specific circumstances, Expected Error Reduction has been one of the strongest-performing informativeness criteria for active learning.
However, image-importance is individual-specific, i. e. a teaching image is important to a student if it changes their overall ability to discriminate between classes.