no code implementations • 10 Oct 2023 • Lisa Alazraki, Lluis Castrejon, Mostafa Dehghani, Fantine Huot, Jasper Uijlings, Thomas Mensink
So it is a trivial exercise to create an ensemble with substantial real gains.
1 code implementation • ICCV 2023 • Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari
Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.
1 code implementation • 17 May 2023 • Thomas Mensink, Pascal Mettes
To make optimisation tractable, we propose a dual-contrastive Infinite Class Mixup loss, where we contrast the classifier of a mixed pair to both the classifiers and the predicted outputs of other mixed pairs in a batch.
1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
The scaling of Transformers has driven breakthrough capabilities for language models.
Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet
no code implementations • 9 Jun 2022 • Jasper Uijlings, Thomas Mensink, Vittorio Ferrari
To find relations between labels across datasets, we propose methods based on language, on vision, and on their combination.
no code implementations • 4 Apr 2022 • Andrea Agostinelli, Michal Pándy, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari
Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all.
no code implementations • CVPR 2022 • Andrea Agostinelli, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari
We address the problem of ensemble selection in transfer learning: Given a large pool of source models we want to select an ensemble of models which, after fine-tuning on the target training set, yields the best performance on the target test set.
no code implementations • CVPR 2022 • Michal Pándy, Andrea Agostinelli, Jasper Uijlings, Vittorio Ferrari, Thomas Mensink
Then, we estimate their pairwise class separability using the Bhattacharyya coefficient, yielding a simple and effective measure of how well the source model transfers to the target task.
no code implementations • 24 Mar 2021 • Thomas Mensink, Jasper Uijlings, Alina Kuznetsova, Michael Gygli, Vittorio Ferrari
Our study leads to several insights and concrete recommendations: (1) for most tasks there exists a source which significantly outperforms ILSVRC'12 pre-training; (2) the image domain is the most important factor for achieving positive transfer; (3) the source dataset should \emph{include} the image domain of the target dataset to achieve best results; (4) at the same time, we observe only small negative effects when the image domain of the source task is much broader than that of the target; (5) transfer across task types can be beneficial, but its success is heavily dependent on both the source and target task types.
1 code implementation • 9 Nov 2020 • Hoang-An Le, Thomas Mensink, Partha Das, Sezer Karaoglu, Theo Gevers
Multimodal large-scale datasets for outdoor scenes are mostly designed for urban driving problems.
1 code implementation • 17 Sep 2020 • Hoang-An Le, Thomas Mensink, Partha Das, Theo Gevers
In this paper the argument is made that for true novel view synthesis of objects, where the object can be synthesized from any viewpoint, an explicit 3D shape representation isdesired.
1 code implementation • 3 Sep 2020 • Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink
In this paper, we propose a weighting scheme based on the coefficient of variations and set the weights based on properties observed while training the model.
1 code implementation • ECCV 2020 • Yunlu Chen, Vincent Tao Hu, Efstratios Gavves, Thomas Mensink, Pascal Mettes, Pengwan Yang, Cees G. M. Snoek
In this paper, we define data augmentation between point clouds as a shortest path linear interpolation.
Ranked #6 on 3D Point Cloud Classification on ModelNet40-C
3D Point Cloud Classification 3D Point Cloud Data Augmentation +2
1 code implementation • ICLR 2021 • Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley
From this, by approximating the empirical cumulative distribution using a differentiable function via splines, we obtain a recalibration function, which maps the network outputs to actual (calibrated) class assignment probabilities.
no code implementations • 23 Jun 2020 • Amir Rahimi, Thomas Mensink, Kartik Gupta, Thalaiyasingam Ajanthan, Cristian Sminchisescu, Richard Hartley
Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves.
1 code implementation • 20 May 2020 • Alex Bewley, Pei Sun, Thomas Mensink, Dragomir Anguelov, Cristian Sminchisescu
This paper presents a novel 3D object detection framework that processes LiDAR data directly on its native representation: range images.
1 code implementation • 29 Oct 2019 • Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink
For the quality of the image reconstruction and disparity prediction, a combination of different losses is used, including L1 image reconstruction losses and left-right disparity smoothness.
no code implementations • 3 Oct 2019 • Yunlu Chen, Thomas Mensink, Efstratios Gavves
We propose to model the effective receptive field of 2D convolution based on the scale and locality from the 3D neighborhood.
1 code implementation • 5 Dec 2018 • Hoàng-Ân Lê, Tushar Nimbhorkar, Thomas Mensink, Anil S. Baslamisli, Sezer Karaoglu, Theo Gevers
There hardly exists any large-scale datasets with dense optical flow of non-rigid motion from real-world imagery as of today.
1 code implementation • 19 Jul 2018 • Hoang-An Le, Anil S. Baslamisli, Thomas Mensink, Theo Gevers
Optical flow, semantic segmentation, and surface normals represent different information modalities, yet together they bring better cues for scene understanding problems.
1 code implementation • 16 Apr 2018 • Ysbrand Galama, Thomas Mensink
Our models learn a visual representation that can be used for objects seen in training, but also for never seen objects.
no code implementations • 30 Jan 2018 • Spencer Cappallo, Stacey Svetlichnaya, Pierre Garrigues, Thomas Mensink, Cees G. M. Snoek
Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages.
no code implementations • 8 Apr 2016 • Rocco De Rosa, Thomas Mensink, Barbara Caputo
Recent attempts, like the open world recognition framework, tried to inject dynamics into the system by detecting new unknown classes and adding them incrementally, while at the same time continuously updating the models for the known classes.
no code implementations • 8 Nov 2015 • Amirhossein Habibian, Thomas Mensink, Cees G. M. Snoek
In our proposed embedding, which we call VideoStory, the correlations between the terms are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the VideoStory using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation.
no code implementations • ICCV 2015 • Mihir Jain, Jan C. van Gemert, Thomas Mensink, Cees G. M. Snoek
Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories.
Ranked #21 on Zero-Shot Action Recognition on UCF101
no code implementations • ICCV 2015 • Efstratios Gavves, Thomas Mensink, Tatiana Tommasi, Cees G. M. Snoek, Tinne Tuytelaars
How can we reuse existing knowledge, in the form of available datasets, when solving a new and apparently unrelated target task from a set of unlabeled data?
no code implementations • CVPR 2014 • Thomas Mensink, Efstratios Gavves, Cees G. M. Snoek
In this paper we aim for zero-shot classification, that is visual recognition of an unseen class by using knowledge transfer from known classes.