2 code implementations • 5 Apr 2024 • Vladan Stojnić, Yannis Kalantidis, Giorgos Tolias
We leverage the graph structure of the unlabeled data and introduce ZLaP, a method based on label propagation (LP) that utilizes geodesic distances for classification.
no code implementations • 14 Feb 2024 • Yannis Kalantidis, Mert Bülent Sarıyıldız, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka
After expanding the training set, we propose a training approach that leverages the specificities and the underlying geometry of this mix of real and synthetic images.
no code implementations • 28 Mar 2023 • Juliette Bertrand, Yannis Kalantidis, Giorgos Tolias
Few-shot action recognition, i. e. recognizing new action classes given only a few examples, benefits from incorporating temporal information.
no code implementations • CVPR 2023 • Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, Yannis Kalantidis
We show that with minimal and class-agnostic prompt engineering, ImageNet clones are able to close a large part of the gap between models produced by synthetic images and models trained with real images, for the several standard classification benchmarks that we consider in this study.
no code implementations • 5 Oct 2022 • Jon Almazán, Byungsoo Ko, Geonmo Gu, Diane Larlus, Yannis Kalantidis
We address it with the proposed Grappa, an approach that starts from a strong pretrained model, and adapts it to tackle multiple retrieval tasks concurrently, using only unlabeled images from the different task domains.
1 code implementation • 22 Aug 2022 • Fabien Baradel, Romain Brégier, Thibault Groueix, Philippe Weinzaepfel, Yannis Kalantidis, Grégory Rogez
It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information.
1 code implementation • 30 Jun 2022 • Mert Bulent Sariyildiz, Yannis Kalantidis, Karteek Alahari, Diane Larlus
We consider the problem of training a deep neural network on a given classification task, e. g., ImageNet-1K (IN1K), so that it excels at both the training task as well as at other (future) transfer tasks.
1 code implementation • ICLR 2022 • Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis
Second, they are typically trained with a global loss that only acts on top of an aggregation of local features; by contrast, testing is based on local feature matching, which creates a discrepancy between training and testing.
Ranked #3 on Image Retrieval on ROxford (Medium)
1 code implementation • 18 Oct 2021 • Yannis Kalantidis, Carlos Lassance, Jon Almazan, Diane Larlus
Dimensionality reduction methods are unsupervised approaches which learn low-dimensional spaces where some properties of the initial space, typically the notion of "neighborhood", are preserved.
1 code implementation • 18 Oct 2021 • Fabien Baradel, Thibault Groueix, Philippe Weinzaepfel, Romain Brégier, Yannis Kalantidis, Grégory Rogez
In fact, we show that simply fine-tuning the batch normalization layers of the model is enough to achieve large gains.
Ranked #65 on 3D Human Pose Estimation on 3DPW
4 code implementations • CVPR 2021 • Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis Kalantidis, Diane Larlus
Instead, we propose to use Probabilistic Cross-Modal Embedding (PCME), where samples from the different modalities are represented as probabilistic distributions in the common embedding space.
1 code implementation • ICCV 2021 • Mert Bulent Sariyildiz, Yannis Kalantidis, Diane Larlus, Karteek Alahari
In this paper, we argue that the semantic relationships between seen and unseen concepts affect generalization performance and propose ImageNet-CoG, a novel benchmark on the ImageNet-21K (IN-21K) dataset that enables measuring concept generalization in a principled way.
1 code implementation • NeurIPS 2020 • Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, Diane Larlus
Based on these observations, and motivated by the success of data mixing, we propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead.
no code implementations • 23 Apr 2020 • Yannis Kalantidis, Laura Sevilla-Lara, Ernest Mwebaze, Dina Machuve, Hamed Alemohammad, David Guerena
The workshop was held in conjunction with the International Conference on Learning Representations (ICLR) 2020.
4 code implementations • ICLR 2020 • Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem.
Ranked #3 on Long-tail learning with class descriptors on CUB-LT
2 code implementations • 1 Jun 2019 • Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
When automatically generating a sentence description for an image or video, it often remains unclear how well the generated caption is grounded, that is whether the model uses the correct image regions to output particular words, or if the model is hallucinating based on priors in the dataset and/or the language model.
28 code implementations • ICCV 2019 • Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng
Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.
Ranked #147 on Action Classification on Kinetics-400
no code implementations • CVPR 2019 • Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, Kristen Grauman
Highlight detection has the potential to significantly ease video browsing, but existing methods often suffer from expensive supervision requirements, where human viewers must manually identify highlights in training videos.
no code implementations • CVPR 2019 • Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow.
Ranked #1 on Action Recognition on UCF-101
2 code implementations • CVPR 2019 • Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach
Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.
1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Yannis Kalantidis, Li-Jia Li, and Alexander Hauptmann
In addition to a text answer, a few grounding photos are also given to justify the answer.
Ranked #1 on Memex Question Answering on MemexQA
2 code implementations • NeurIPS 2018 • Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
Learning to capture long-range relations is fundamental to image/video recognition.
9 code implementations • CVPR 2019 • Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Shuicheng Yan, Jiashi Feng, Yannis Kalantidis
In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed.
no code implementations • 27 Oct 2018 • Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
Learning to capture long-range relations is fundamental to image/video recognition.
Ranked #35 on Action Recognition on UCF101
no code implementations • ECCV 2018 • Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
In this paper, we aim to reduce the computational cost of spatio-temporal deep neural networks, making them run as fast as their 2D counterparts while preserving state-of-the-art accuracy on video recognition benchmarks.
Ranked #36 on Action Recognition on UCF101 (using extra training data)
2 code implementations • 27 Apr 2018 • Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny
Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of <subject, relation, object> triples.
1 code implementation • 4 Aug 2017 • Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann
This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.
no code implementations • 6 Dec 2016 • Kofi Boakye, Sachin Farfade, Hamid Izadinia, Yannis Kalantidis, Pierre Garrigues
Our results demonstrate that, for real-world datasets, training exclusively with this noisy data yields performance on par with the standard paradigm of first pre-training on clean data and then fine-tuning.
no code implementations • 21 Apr 2016 • Yannis Kalantidis, Lyndon Kennedy, Huy Nguyen, Clayton Mellina, David A. Shamma
We propose a novel hashing-based matching scheme, called Locally Optimized Hashing (LOH), based on a state-of-the-art quantization algorithm that can be used for efficient, large-scale search, recommendation, clustering, and deduplication.
no code implementations • 21 Apr 2016 • Yannis Kalantidis, Ayman Farahat, Lyndon Kennedy, Ricardo Baeza-Yates, David A. Shamma
The quality of user experience online is affected by the relevance and placement of advertisements.
1 code implementation • 23 Feb 2016 • Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Fei-Fei Li
Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering.
1 code implementation • 13 Dec 2015 • Yannis Kalantidis, Clayton Mellina, Simon Osindero
We propose a simple and straightforward way of creating powerful image representations via cross-dimensional weighting and aggregation of deep convolutional neural network layer outputs.
Ranked #13 on Image Retrieval on RParis (Medium)
1 code implementation • ICCV 2015 • Yannis Avrithis, Yannis Kalantidis, Evangelos Anagnostopoulos, Ioannis Z. Emiris
Large scale duplicate detection, clustering and mining of documents or images has been conventionally treated with seed detection via hashing, followed by seed growing heuristics using fast search.
no code implementations • CVPR 2014 • Yannis Kalantidis, Yannis Avrithis
We present a simple vector quantizer that combines low distortion with fast search and apply it to approximate nearest neighbor (ANN) search in high dimensional spaces.