Search Results for author: Thomas Mensink

Found 28 papers, 13 papers with code

HAMMR: HierArchical MultiModal React agents for generic VQA

no code implementations8 Apr 2024 Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.

Optical Character Recognition (OCR) Question Answering +1

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

1 code implementation ICCV 2023 Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.

Question Answering Retrieval +1

Infinite Class Mixup

1 code implementation17 May 2023 Thomas Mensink, Pascal Mettes

To make optimisation tractable, we propose a dual-contrastive Infinite Class Mixup loss, where we contrast the classifier of a mixed pair to both the classifiers and the predicted outputs of other mixed pairs in a batch.

The Missing Link: Finding label relations across datasets

no code implementations9 Jun 2022 Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

To find relations between labels across datasets, we propose methods based on language, on vision, and on their combination.

Specificity Transfer Learning

How stable are Transferability Metrics evaluations?

no code implementations4 Apr 2022 Andrea Agostinelli, Michal Pándy, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all.

Image Classification Semantic Segmentation

Transferability Metrics for Selecting Source Model Ensembles

no code implementations CVPR 2022 Andrea Agostinelli, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

We address the problem of ensemble selection in transfer learning: Given a large pool of source models we want to select an ensemble of models which, after fine-tuning on the target training set, yields the best performance on the target test set.

Semantic Segmentation Transfer Learning

Transferability Estimation using Bhattacharyya Class Separability

no code implementations CVPR 2022 Michal Pándy, Andrea Agostinelli, Jasper Uijlings, Vittorio Ferrari, Thomas Mensink

Then, we estimate their pairwise class separability using the Bhattacharyya coefficient, yielding a simple and effective measure of how well the source model transfers to the target task.

Classification Image Classification +2

Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types

no code implementations24 Mar 2021 Thomas Mensink, Jasper Uijlings, Alina Kuznetsova, Michael Gygli, Vittorio Ferrari

Our study leads to several insights and concrete recommendations: (1) for most tasks there exists a source which significantly outperforms ILSVRC'12 pre-training; (2) the image domain is the most important factor for achieving positive transfer; (3) the source dataset should \emph{include} the image domain of the target dataset to achieve best results; (4) at the same time, we observe only small negative effects when the image domain of the source task is much broader than that of the target; (5) transfer across task types can be beneficial, but its success is heavily dependent on both the source and target task types.

Autonomous Driving Depth Estimation +6

Novel View Synthesis from Single Images via Point Cloud Transformation

1 code implementation17 Sep 2020 Hoang-An Le, Thomas Mensink, Partha Das, Theo Gevers

In this paper the argument is made that for true novel view synthesis of objects, where the object can be synthesized from any viewpoint, an explicit 3D shape representation isdesired.

3D Shape Representation Novel View Synthesis

Multi-Loss Weighting with Coefficient of Variations

1 code implementation3 Sep 2020 Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink

In this paper, we propose a weighting scheme based on the coefficient of variations and set the weights based on properties observed while training the model.

Monocular Depth Estimation Multi-Task Learning +1

Calibration of Neural Networks using Splines

1 code implementation ICLR 2021 Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley

From this, by approximating the empirical cumulative distribution using a differentiable function via splines, we obtain a recalibration function, which maps the network outputs to actual (calibrated) class assignment probabilities.

Decision Making Image Classification

Post-hoc Calibration of Neural Networks by g-Layers

no code implementations23 Jun 2020 Amir Rahimi, Thomas Mensink, Kartik Gupta, Thalaiyasingam Ajanthan, Cristian Sminchisescu, Richard Hartley

Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves.

Decision Making Image Classification

Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection

1 code implementation20 May 2020 Alex Bewley, Pei Sun, Thomas Mensink, Dragomir Anguelov, Cristian Sminchisescu

This paper presents a novel 3D object detection framework that processes LiDAR data directly on its native representation: range images.

3D Object Detection Autonomous Driving +1

On the Benefit of Adversarial Training for Monocular Depth Estimation

1 code implementation29 Oct 2019 Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink

For the quality of the image reconstruction and disparity prediction, a combination of different losses is used, including L1 image reconstruction losses and left-right disparity smoothness.

Depth Prediction Generative Adversarial Network +2

3D Neighborhood Convolution: Learning Depth-Aware Features for RGB-D and RGB Semantic Segmentation

no code implementations3 Oct 2019 Yunlu Chen, Thomas Mensink, Efstratios Gavves

We propose to model the effective receptive field of 2D convolution based on the scale and locality from the 3D neighborhood.

Segmentation Semantic Segmentation

Automatic Generation of Dense Non-rigid Optical Flow

1 code implementation5 Dec 2018 Hoàng-Ân Lê, Tushar Nimbhorkar, Thomas Mensink, Anil S. Baslamisli, Sezer Karaoglu, Theo Gevers

There hardly exists any large-scale datasets with dense optical flow of non-rigid motion from real-world imagery as of today.

Optical Flow Estimation

Three for one and one for three: Flow, Segmentation, and Surface Normals

1 code implementation19 Jul 2018 Hoang-An Le, Anil S. Baslamisli, Thomas Mensink, Theo Gevers

Optical flow, semantic segmentation, and surface normals represent different information modalities, yet together they bring better cues for scene understanding problems.

Optical Flow Estimation Scene Understanding +2

IterGANs: Iterative GANs to Learn and Control 3D Object Transformation

1 code implementation16 Apr 2018 Ysbrand Galama, Thomas Mensink

Our models learn a visual representation that can be used for objects seen in training, but also for never seen objects.


The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval

no code implementations30 Jan 2018 Spencer Cappallo, Stacey Svetlichnaya, Pierre Garrigues, Thomas Mensink, Cees G. M. Snoek

Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages.


Online Open World Recognition

no code implementations8 Apr 2016 Rocco De Rosa, Thomas Mensink, Barbara Caputo

Recent attempts, like the open world recognition framework, tried to inject dynamics into the system by detecting new unknown classes and adding them incrementally, while at the same time continuously updating the models for the known classes.

Incremental Learning Metric Learning

VideoStory Embeddings Recognize Events when Examples are Scarce

no code implementations8 Nov 2015 Amirhossein Habibian, Thomas Mensink, Cees G. M. Snoek

In our proposed embedding, which we call VideoStory, the correlations between the terms are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the VideoStory using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation.

Attribute Event Detection

Active Transfer Learning with Zero-Shot Priors: Reusing Past Datasets for Future Tasks

no code implementations ICCV 2015 Efstratios Gavves, Thomas Mensink, Tatiana Tommasi, Cees G. M. Snoek, Tinne Tuytelaars

How can we reuse existing knowledge, in the form of available datasets, when solving a new and apparently unrelated target task from a set of unlabeled data?

Active Learning General Classification +2

COSTA: Co-Occurrence Statistics for Zero-Shot Classification

no code implementations CVPR 2014 Thomas Mensink, Efstratios Gavves, Cees G. M. Snoek

In this paper we aim for zero-shot classification, that is visual recognition of an unseen class by using knowledge transfer from known classes.

Classification Few-Shot Learning +3

Cannot find the paper you are looking for? You can Submit a new open access paper.