In this work, we present an approach that leverages three highly synergistic components, which are identified as key ingredients: joint classifier training with inliers and outliers, semi-supervised learning through pseudo-labeling, and model ensembling.
The problem comprises two distinct but related tasks: determining whether a query video shares content with a reference video ("detection"), and additionally temporally localizing the shared content within each video ("localization").
In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image.
We introduce S$^2$VS, a video similarity learning approach with self-supervision.
Few-shot action recognition, i. e. recognizing new action classes given only a few examples, benefits from incorporating temporal information.
The goal is to obtain a network for database examples that is trained to operate on large resolution images and benefits from fine-grained image details, and a second network for query examples that operates on small resolution images but preserves a representation space aligned with that of the database network.
To bridge the domain gap we present a novel augmentation technique that is tailored to the task of learning sketch recognition from a training set of natural images.
no code implementations • 8 Feb 2022 • Zoë Papakipos, Giorgos Tolias, Tomas Jenicek, Ed Pizzi, Shuhei Yokoo, Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang, Sanjay Addicam, Sergio Manuel Papadakis, Cristian Canton Ferrer, Ondrej Chum, Matthijs Douze
The 2021 Image Similarity Challenge introduced a dataset to serve as a new benchmark to evaluate recent image copy detection methods.
Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing.
This work focuses on learning deep visual representation models for retrieval by exploring the interplay between a new loss function, the batch size, and a new regularization approach.
Ranked #1 on Vehicle Re-Identification on VehicleID Small
1 code implementation • 17 Jun 2021 • Matthijs Douze, Giorgos Tolias, Ed Pizzi, Zoë Papakipos, Lowik Chanussot, Filip Radenovic, Tomas Jenicek, Maxim Maximov, Laura Leal-Taixé, Ismail Elezi, Ondřej Chum, Cristian Canton Ferrer
This benchmark is used for the Image Similarity Challenge at NeurIPS'21 (ISC2021).
Ranked #1 on Image Similarity Detection on DISC21 dev
At inference, the local descriptors are provided by the activations of internal components of the network.
Ranked #5 on Image Retrieval on ROxford (Medium)
In this work we consider the problem of learning a classifier from noisy labels when a few clean labeled examples are given.
We show successful attacks to partially unknown systems, by designing various loss functions for the adversarial image construction.
In this work, we employ a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network.
State of the art image retrieval performance is achieved with CNN features and manifold ranking using a k-NN similarity graph that is pre-computed off-line.
Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds.
In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth.
We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval.
Ranked #9 on Image Retrieval on RParis (Medium)
Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually retrieving the clutter in the case of query expansion.
To demonstrate the advantages of the AFM method, we derive a short vector image representation that, due to asymmetric feature maps, supports efficient scale and translation invariant sketch-based image retrieval.
The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches.
Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks.
Ranked #5 on Image Retrieval on Par6k
Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations.
Ranked #4 on Image Retrieval on Par6k
Our results show that the regular dense detector is outperformed by other methods in most situations, leading us to improve the state of the art in comparable setups on standard retrieval and fined-grain benchmarks.
Our geometric-aware aggregation strategy is effective for image search, as shown by experiments performed on standard benchmarks for image and particular object retrieval, namely Holidays and Oxford buildings.