Search Results for author: Michal Drozdzal

Found 38 papers, 26 papers with code

Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

no code implementations13 Sep 2024 Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, Ricky T. Q. Chen

Dynamical generative models that produce samples through an iterative process, such as Flow Matching and denoising diffusion models, have seen widespread use, but there has not been many theoretically-sound methods for improving these models with reward fine-tuning.

Denoising Diversity

Consistency-diversity-realism Pareto fronts of conditional image generative models

no code implementations14 Jun 2024 Pietro Astolfi, Marlene Careil, Melissa Hall, Oscar Mañas, Matthew Muckley, Jakob Verbeek, Adriana Romero Soriano, Michal Drozdzal

Building world models that accurately and comprehensively represent the real world is the utmost aspiration for conditional image generative models as it would enable their use as world simulators.

Diversity

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

1 code implementation6 Jun 2024 Reyhane Askari Hemmat, Melissa Hall, Alicia Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano

With the growing popularity of text-to-image generative models, there has been increasing focus on understanding their risks and biases.

Diversity

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

1 code implementation7 May 2024 Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, Adriana Romero Soriano

We contrast human annotations with common automated metrics, finding that human preferences vary notably across geographic location and that current metrics do not fully account for this diversity.

Diversity

Improving Text-to-Image Consistency via Automatic Prompt Optimization

no code implementations26 Mar 2024 Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.

Language Modelling Large Language Model

InCoRo: In-Context Learning for Robotics Control with Feedback Loops

no code implementations7 Feb 2024 Jiaqiang Ye Zhu, Carla Gomez Cano, David Vazquez Bermudez, Michal Drozdzal

We highlight the generalization capabilities of our system and show that (1) in-context learning in combination with the current state-of-the-art LLMs is an effective way to implement a robotic controller; (2) in static environments, InCoRo surpasses the prior art in terms of the success rate; (3) in dynamic environments, we establish new state-of-the-art for the SCARA and DELTA units, respectively.

In-Context Learning Scene Understanding +1

Feedback-guided Data Synthesis for Imbalanced Classification

1 code implementation29 Sep 2023 Reyhane Askari Hemmat, Mohammad Pezeshki, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano

In this work, we introduce a framework for augmenting static datasets with useful synthetic samples, which leverages one-shot feedback from the classifier to drive the sampling of the generative model.

Classification imbalanced classification

DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

1 code implementation11 Aug 2023 Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero Soriano

The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases.

Benchmarking Diversity +1

Improved baselines for vision-language pre-training

1 code implementation15 May 2023 Enrico Fini, Pietro Astolfi, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal

Indeed, we find that a simple CLIP baseline can also be improved substantially, up to a 25% relative improvement on downstream zero-shot tasks, by using well-known training techniques that are popular in other subfields.

Contrastive Learning Data Augmentation +1

Controllable Image Generation via Collage Representations

no code implementations26 Apr 2023 Arantxa Casanova, Marlène Careil, Adriana Romero-Soriano, Christopher J. Pal, Jakob Verbeek, Michal Drozdzal

Our experiments on the OI dataset show that M&Ms outperforms baselines in terms of fine-grained scene controllability while being very competitive in terms of image quality and sample diversity.

Attribute Image Generation

Instance-Conditioned GAN Data Augmentation for Representation Learning

no code implementations16 Mar 2023 Pietro Astolfi, Arantxa Casanova, Jakob Verbeek, Pascal Vincent, Adriana Romero-Soriano, Michal Drozdzal

We showcase the benefits of DA_IC-GAN by plugging it out-of-the-box into the supervised training of ResNets and DeiT models on the ImageNet dataset, and achieving accuracy boosts up to between 1%p and 2%p with the highest capacity models.

Data Augmentation Few-Shot Learning +1

Learning to Substitute Ingredients in Recipes

1 code implementation15 Feb 2023 Bahare Fatemi, Quentin Duval, Rohit Girdhar, Michal Drozdzal, Adriana Romero-Soriano

Recipe personalization through ingredient substitution has the potential to help people meet their dietary needs and preferences, avoid potential allergens, and ease culinary exploration in everyone's kitchen.

Recipe Generation

ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

no code implementations3 Nov 2022 Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim

Equipped with ImageNet-X, we investigate 2, 200 current recognition models and study the types of mistakes as a function of model's (1) architecture, e. g. transformer vs. convolutional, (2) learning paradigm, e. g. supervised vs. self-supervised, and (3) training procedures, e. g., data augmentation.

Data Augmentation

Uncertainty-Driven Active Vision for Implicit Scene Reconstruction

1 code implementation3 Oct 2022 Edward J. Smith, Michal Drozdzal, Derek Nowrouzezahrai, David Meger, Adriana Romero-Soriano

We evaluate our proposed approach on the ABC dataset and the in the wild CO3D dataset, and show that: (1) we are able to obtain high quality state-of-the-art occupancy reconstructions; (2) our perspective conditioned uncertainty definition is effective to drive improvements in next best view selection and outperforms strong baseline approaches; and (3) we can further improve shape understanding by performing a gradient-based search on the view selection candidates.

Scene Understanding

On learning adaptive acquisition policies for undersampled multi-coil MRI reconstruction

1 code implementation30 Mar 2022 Tim Bakker, Matthew Muckley, Adriana Romero-Soriano, Michal Drozdzal, Luis Pineda

Most current approaches to undersampled multi-coil MRI reconstruction focus on learning the reconstruction model for a fixed, equidistant acquisition trajectory.

MRI Reconstruction SSIM

Parameter Prediction for Unseen Deep Architectures

1 code implementation NeurIPS 2021 Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana Romero-Soriano

We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet.

Parameter Prediction

Active 3D Shape Reconstruction from Vision and Touch

2 code implementations NeurIPS 2021 Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal

In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2)a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration.

3D Reconstruction 3D Shape Reconstruction

Generating unseen complex scenes: are we there yet?

no code implementations7 Dec 2020 Arantxa Casanova, Michal Drozdzal, Adriana Romero-Soriano

In this paper, we propose a methodology to compare complex scene conditional generation models, and provide an in-depth analysis that assesses the ability of each model to (1) fit the training distribution and hence perform well on seen conditionings, (2) to generalize to unseen conditionings composed of seen object combinations, and (3) generalize to unseen conditionings composed of unseen object combinations.

Object

Instance Selection for GANs

2 code implementations NeurIPS 2020 Terrance DeVries, Michal Drozdzal, Graham W. Taylor

By refining the empirical data distribution before training, we redirect model capacity towards high-density regions, which ultimately improves sample fidelity, lowers model capacity requirements, and significantly reduces training time.

Conditional Image Generation

Active MR k-space Sampling with Reinforcement Learning

2 code implementations20 Jul 2020 Luis Pineda, Sumana Basu, Adriana Romero, Roberto Calandra, Michal Drozdzal

Deep learning approaches have recently shown great promise in accelerating magnetic resonance image (MRI) acquisition.

Image Reconstruction reinforcement-learning +1

3D Shape Reconstruction from Vision and Touch

1 code implementation NeurIPS 2020 Edward J. Smith, Roberto Calandra, Adriana Romero, Georgia Gkioxari, David Meger, Jitendra Malik, Michal Drozdzal

When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with.

3D Shape Reconstruction

Extending Unsupervised Neural Image Compression With Supervised Multitask Learning

no code implementations MIDL 2019 David Tellez, Diederik Hoppener, Cornelis Verhoef, Dirk Grunhagen, Pieter Nierop, Michal Drozdzal, Jeroen van der Laak, Francesco Ciompi

Additionally, we trained multiple encoders with different training objectives, e. g. unsupervised and variants of MTL, and observed a positive correlation between the number of tasks in MTL and the system performance on the TUPAC16 dataset.

Image Compression

Needles in Haystacks: On Classifying Tiny Objects in Large Images

1 code implementation16 Aug 2019 Nick Pawlowski, Suvrat Bhooshan, Nicolas Ballas, Francesco Ciompi, Ben Glocker, Michal Drozdzal

In some important computer vision domains, such as medical or hyperspectral imaging, we care about the classification of tiny objects in large images.

Classification General Classification +2

On the Evaluation of Conditional GANs

1 code implementation11 Jul 2019 Terrance DeVries, Adriana Romero, Luis Pineda, Graham W. Taylor, Michal Drozdzal

We show that FJD can be used as a promising single metric for cGAN benchmarking and model selection.

Benchmarking Diversity +1

Elucidating image-to-set prediction: An analysis of models, losses and datasets

1 code implementation11 Apr 2019 Luis Pineda, Amaia Salvador, Michal Drozdzal, Adriana Romero

In this paper, we identify an important reproducibility challenge in the image-to-set prediction literature that impedes proper comparisons among published methods, namely, researchers use different evaluation protocols to assess their contributions.

Multi-Label Classification

The Liver Tumor Segmentation Benchmark (LiTS)

6 code implementations13 Jan 2019 Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, Fabian Lohöfer, Julian Walter Holch, Wieland Sommer, Felix Hofmann, Alexandre Hostettler, Naama Lev-Cohain, Michal Drozdzal, Michal Marianne Amitai, Refael Vivantik, Jacob Sosna, Ivan Ezhov, Anjany Sekuboyina, Fernando Navarro, Florian Kofler, Johannes C. Paetzold, Suprosanna Shit, Xiaobin Hu, Jana Lipková, Markus Rempfler, Marie Piraud, Jan Kirschke, Benedikt Wiestler, Zhiheng Zhang, Christian Hülsemeyer, Marcel Beetz, Florian Ettlinger, Michela Antonelli, Woong Bae, Míriam Bellver, Lei Bi, Hao Chen, Grzegorz Chlebus, Erik B. Dam, Qi Dou, Chi-Wing Fu, Bogdan Georgescu, Xavier Giró-i-Nieto, Felix Gruen, Xu Han, Pheng-Ann Heng, Jürgen Hesser, Jan Hendrik Moltz, Christian Igel, Fabian Isensee, Paul Jäger, Fucang Jia, Krishna Chaitanya Kaluva, Mahendra Khened, Ildoo Kim, Jae-Hun Kim, Sungwoong Kim, Simon Kohl, Tomasz Konopczynski, Avinash Kori, Ganapathy Krishnamurthi, Fan Li, Hongchao Li, Junbo Li, Xiaomeng Li, John Lowengrub, Jun Ma, Klaus Maier-Hein, Kevis-Kokitsi Maninis, Hans Meine, Dorit Merhof, Akshay Pai, Mathias Perslev, Jens Petersen, Jordi Pont-Tuset, Jin Qi, Xiaojuan Qi, Oliver Rippel, Karsten Roth, Ignacio Sarasua, Andrea Schenk, Zengming Shen, Jordi Torres, Christian Wachinger, Chunliang Wang, Leon Weninger, Jianrong Wu, Daguang Xu, Xiaoping Yang, Simon Chun-Ho Yu, Yading Yuan, Miao Yu, Liping Zhang, Jorge Cardoso, Spyridon Bakas, Rickmer Braren, Volker Heinemann, Christopher Pal, An Tang, Samuel Kadoury, Luc Soler, Bram van Ginneken, Hayit Greenspan, Leo Joskowicz, Bjoern Menze

In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018.

Benchmarking Computed Tomography (CT) +3

Inverse Cooking: Recipe Generation from Food Images

4 code implementations CVPR 2019 Amaia Salvador, Michal Drozdzal, Xavier Giro-i-Nieto, Adriana Romero

Our system predicts ingredients as sets by means of a novel architecture, modeling their dependencies without imposing any order, and then generates cooking instructions by attending to both image and its inferred ingredients simultaneously.

Recipe Generation Retrieval

On the iterative refinement of densely connected representation levels for semantic segmentation

1 code implementation30 Apr 2018 Arantxa Casanova, Guillem Cucurull, Michal Drozdzal, Adriana Romero, Yoshua Bengio

State-of-the-art semantic segmentation approaches increase the receptive field of their models by using either a downsampling path composed of poolings/strided convolutions or successive dilated convolutions.

Image Segmentation Scene Understanding +1

Learnable Explicit Density for Continuous Latent Space and Variational Inference

no code implementations6 Oct 2017 Chin-wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior.

Density Estimation Variational Inference

Image Segmentation by Iterative Inference from Conditional Score Estimation

1 code implementation ICLR 2018 Adriana Romero, Michal Drozdzal, Akram Erraqabi, Simon Jégou, Yoshua Bengio

We experimentally find that the proposed iterative inference from conditional score estimation by conditional denoising autoencoders performs better than comparable models based on CRFs or those not using any explicit modeling of the conditional joint distribution of outputs.

Denoising Image Segmentation +1

Learning Normalized Inputs for Iterative Estimation in Medical Image Segmentation

no code implementations16 Feb 2017 Michal Drozdzal, Gabriel Chartrand, Eugene Vorontsov, Lisa Di Jorio, An Tang, Adriana Romero, Yoshua Bengio, Chris Pal, Samuel Kadoury

Moreover, when applying our 2D pipeline on a challenging 3D MRI prostate segmentation challenge we reach results that are competitive even when compared to 3D methods.

Image Segmentation Medical Image Segmentation +2

The Importance of Skip Connections in Biomedical Image Segmentation

1 code implementation14 Aug 2016 Michal Drozdzal, Eugene Vorontsov, Gabriel Chartrand, Samuel Kadoury, Chris Pal

In this paper, we study the influence of both long and short skip connections on Fully Convolutional Networks (FCN) for biomedical image segmentation.

Image Segmentation Semantic Segmentation

Generic Feature Learning for Wireless Capsule Endoscopy Analysis

no code implementations26 Jul 2016 Santi Seguí, Michal Drozdzal, Guillem Pascual, Petia Radeva, Carolina Malagelada, Fernando Azpiroz, Jordi Vitrià

Most of the CAD systems in the capsule endoscopy share a common system design, but use very different image and video representations.

Cannot find the paper you are looking for? You can Submit a new open access paper.