However, collecting large datasets in a time- and cost-efficient manner often results in label noise.
Ranked #4 on Image Classification on mini WebVision 1.0
An effective and simple approach to long-tailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively.
Ranked #3 on Long-tail Learning on iNaturalist 2018
We assume that the model is updated incrementally for new classes as new data becomes available sequentially. This requires adapting the previously stored feature vectors to the updated feature space without having access to the corresponding original training images.
In this work we consider the problem of learning a classifier from noisy labels when a few clean labeled examples are given.
In this work, we employ a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network.
State of the art image retrieval performance is achieved with CNN features and manifold ranking using a k-NN similarity graph that is pre-computed off-line.
Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds.
In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth.
Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually retrieving the clutter in the case of query expansion.
The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches.
Experiments with standard image search benchmarks, including the Yahoo100M dataset comprising 100 million images, show that our method gives comparable (and sometimes superior) accuracy compared to exhaustive search while requiring only 10% of the vector operations and memory.
We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory.
Our results show that the regular dense detector is outperformed by other methods in most situations, leading us to improve the state of the art in comparable setups on standard retrieval and fined-grain benchmarks.
We introduce ConceptVision, a method that aims for high accuracy in categorizing large number of scenes, while keeping the model relatively simpler and efficient for scalability.