Search Results for author: Alessio Tonioni

Found 28 papers, 11 papers with code

BRAVE: Broadening the visual encoding of vision-language models

no code implementations10 Apr 2024 Oğuzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, Federico Tombari

Our results highlight the potential of incorporating different visual biases for a more broad and contextualized visual understanding of VLMs.

Hallucination Language Modelling +1

Text-Conditioned Resampler For Long Form Video Understanding

no code implementations19 Dec 2023 Bruno Korbar, Yongqin Xian, Alessio Tonioni, Andrew Zisserman, Federico Tombari

In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task.

Language Modelling Large Language Model +2

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

no code implementations14 Dec 2023 Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation.

Denoising Semantic Segmentation +1

TouchSDF: A DeepSDF Approach for 3D Shape Reconstruction using Vision-Based Tactile Sensing

no code implementations21 Nov 2023 Mauro Comi, Yijiong Lin, Alex Church, Alessio Tonioni, Laurence Aitchison, Nathan F. Lepora

To address these challenges, we propose TouchSDF, a Deep Learning approach for tactile 3D shape reconstruction that leverages the rich information provided by a vision-based tactile sensor and the expressivity of the implicit neural representation DeepSDF.

3D Shape Reconstruction

TextMesh: Generation of Realistic 3D Meshes From Text Prompts

1 code implementation24 Apr 2023 Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, Michael Niemeyer, Federico Tombari

In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.

NeRF-Supervised Deep Stereo

2 code implementations CVPR 2023 Fabio Tosi, Alessio Tonioni, Daniele De Gregorio, Matteo Poggi

We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth.

Neural Rendering Zero-shot Generalization

NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions

1 code implementation22 Mar 2023 Mohamad Shahbazi, Evangelos Ntavelis, Alessio Tonioni, Edo Collins, Danda Pani Paudel, Martin Danelljan, Luc van Gool

Pose-conditioned convolutional generative models struggle with high-quality 3D-consistent image generation from single-view datasets, due to their lack of sufficient 3D priors.

Image Generation Inductive Bias

Learning Good Features to Transfer Across Tasks and Domains

no code implementations26 Jan 2023 Pierluigi Zama Ramirez, Adriano Cardace, Luca De Luigi, Alessio Tonioni, Samuele Salti, Luigi Di Stefano

Besides, we propose a set of strategies to constrain the learned feature spaces, to ease learning and increase the generalization capability of the mapping network, thereby considerably improving the final performance of our framework.

Monocular Depth Estimation Semantic Segmentation

LatentSwap3D: Semantic Edits on 3D Image GANs

no code implementations2 Dec 2022 Enis Simsar, Alessio Tonioni, Evin Pınar Örnek, Federico Tombari

3D GANs have the ability to generate latent codes for entire 3D volumes rather than only 2D images.

Feature Importance

ParGAN: Learning Real Parametrizable Transformations

no code implementations9 Nov 2022 Diego Martin Arroyo, Alessio Tonioni, Federico Tombari

Current methods for image-to-image translation produce compelling results, however, the applied transformation is difficult to control, since existing mechanisms are often limited and non-intuitive.

Image-to-Image Translation Translation

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction

1 code implementation23 Jun 2021 Farid Yagubbayli, Yida Wang, Alessio Tonioni, Federico Tombari

Most modern deep learning-based multi-view 3D reconstruction techniques use RNNs or fusion modules to combine information from multiple images after independently encoding them.

3D Reconstruction Multi-View 3D Reconstruction +1

Unsupervised Novel View Synthesis from a Single Image

no code implementations5 Feb 2021 Pierluigi Zama Ramirez, Alessio Tonioni, Federico Tombari

Novel view synthesis from a single image aims at generating novel views from a single input image of an object.

Decoder Novel View Synthesis

Batch Normalization Embeddings for Deep Domain Generalization

no code implementations25 Nov 2020 Mattia Segu, Alessio Tonioni, Federico Tombari

Several recent methods use multiple datasets to train models to extract domain-invariant features, hoping to generalize to unseen domains.

Domain Generalization

A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views

no code implementations17 Nov 2020 Riccardo Spezialetti, David Joseph Tan, Alessio Tonioni, Keisuke Tateno, Federico Tombari

Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning.

3D Shape Reconstruction Object +1

Continual Adaptation for Deep Stereo

1 code implementation10 Jul 2020 Matteo Poggi, Alessio Tonioni, Fabio Tosi, Stefano Mattoccia, Luigi Di Stefano

Thus, our network architecture and adaptation algorithms realize the first real-time self-adaptive deep stereo system and pave the way for a new paradigm that can facilitate practical deployment of end-to-end architectures for dense disparity regression.

Depth Estimation

Unsupervised Domain Adaptation for Depth Prediction from Images

1 code implementation9 Sep 2019 Alessio Tonioni, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano

Extensive experimental results based on standard datasets and evaluation protocols prove that our technique can address effectively the domain shift issue with both stereo and monocular depth prediction architectures and outperforms other state-of-the-art unsupervised loss functions that may be alternatively deployed to pursue domain adaptation.

Depth Estimation Depth Prediction +1

Semi-Automatic Labeling for Deep Learning in Robotics

1 code implementation5 Aug 2019 Daniele De Gregorio, Alessio Tonioni, Gianluca Palli, Luigi Di Stefano

In this paper, we propose Augmented Reality Semi-automatic labeling (ARS), a semi-automatic method which leverages on moving a 2D camera by means of a robot, proving precise camera tracking, and an augmented reality pen to define initial object bounding box, to create large labeled datasets with minimal human intervention.

Object object-detection +1

Real-Time Highly Accurate Dense Depth on a Power Budget using an FPGA-CPU Hybrid SoC

no code implementations17 Jul 2019 Oscar Rahnama, Tommaso Cavallari, Stuart Golodetz, Alessio Tonioni, Thomas Joy, Luigi Di Stefano, Simon Walker, Philip H. S. Torr

Obtaining highly accurate depth from stereo images in real time has many applications across computer vision and robotics, but in some contexts, upper bounds on power consumption constrain the feasible hardware to embedded platforms such as FPGAs.

Learning to Adapt for Stereo

1 code implementation CVPR 2019 Alessio Tonioni, Oscar Rahnama, Thomas Joy, Luigi Di Stefano, Thalaiyasingam Ajanthan, Philip H. S. Torr

Real world applications of stereo depth estimation require models that are robust to dynamic variations in the environment.

Autonomous Driving Stereo Depth Estimation

Domain invariant hierarchical embedding for grocery products recognition

no code implementations2 Feb 2019 Alessio Tonioni, Luigi Di Stefano

Moreover, there exist a significant domain shift between the images that should be recognized at test time, taken in stores by cheap cameras, and those available for training, usually just one or a few studio-quality images per product.

Exploiting Semantics in Adversarial Training for Image-Level Domain Adaptation

no code implementations13 Oct 2018 Pierluigi Zama Ramirez, Alessio Tonioni, Luigi Di Stefano

To prove the effectiveness of our proposal, we show how a semantic segmentation CNN trained on images from the synthetic GTA dataset adapted by our method can improve performance by more than 16% mIoU with respect to the same model trained on synthetic images.

Domain Adaptation Segmentation +2

Real-time self-adaptive deep stereo

1 code implementation CVPR 2019 Alessio Tonioni, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano

Deep convolutional neural networks trained end-to-end are the state-of-the-art methods to regress dense disparity maps from stereo pairs.

Stereo Depth Estimation

A deep learning pipeline for product recognition on store shelves

no code implementations3 Oct 2018 Alessio Tonioni, Eugenio Serra, Luigi Di Stefano

Then, available product databases usually include just one or a few studio-quality images per product (referred to herein as reference images), whilst at test time recognition is performed on pictures displaying a portion of a shelf containing several products and taken in the store by cheap cameras (referred to as query images).

Image Retrieval object-detection +2

Unsupervised Adaptation for Deep Stereo

1 code implementation ICCV 2017 Alessio Tonioni, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano

Recent ground-breaking works have shown that deep neural networks can be trained end-to-end to regress dense disparity maps directly from image pairs.

Product recognition in store shelves as a sub-graph isomorphism problem

no code implementations26 Jul 2017 Alessio Tonioni, Luigi Di Stefano

The arrangement of products in store shelves is carefully planned to maximize sales and keep customers happy.

Cannot find the paper you are looking for? You can Submit a new open access paper.