no code implementations • 10 Apr 2024 • Oğuzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, Federico Tombari
Our results highlight the potential of incorporating different visual biases for a more broad and contextualized visual understanding of VLMs.
no code implementations • 29 Mar 2024 • Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
Touch and vision go hand in hand, mutually enhancing our ability to understand the world.
no code implementations • 10 Jan 2024 • Mohamad Shahbazi, Liesbeth Claessens, Michael Niemeyer, Edo Collins, Alessio Tonioni, Luc van Gool, Federico Tombari
We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes.
no code implementations • 19 Dec 2023 • Bruno Korbar, Yongqin Xian, Alessio Tonioni, Andrew Zisserman, Federico Tombari
In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task.
Ranked #5 on Video Question Answering on NExT-QA
no code implementations • 14 Dec 2023 • Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari
Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation.
no code implementations • 21 Nov 2023 • Mauro Comi, Yijiong Lin, Alex Church, Alessio Tonioni, Laurence Aitchison, Nathan F. Lepora
To address these challenges, we propose TouchSDF, a Deep Learning approach for tactile 3D shape reconstruction that leverages the rich information provided by a vision-based tactile sensor and the expressivity of the implicit neural representation DeepSDF.
1 code implementation • 24 Apr 2023 • Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, Michael Niemeyer, Federico Tombari
In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.
2 code implementations • CVPR 2023 • Fabio Tosi, Alessio Tonioni, Daniele De Gregorio, Matteo Poggi
We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth.
1 code implementation • 22 Mar 2023 • Mohamad Shahbazi, Evangelos Ntavelis, Alessio Tonioni, Edo Collins, Danda Pani Paudel, Martin Danelljan, Luc van Gool
Pose-conditioned convolutional generative models struggle with high-quality 3D-consistent image generation from single-view datasets, due to their lack of sufficient 3D priors.
no code implementations • 26 Jan 2023 • Pierluigi Zama Ramirez, Adriano Cardace, Luca De Luigi, Alessio Tonioni, Samuele Salti, Luigi Di Stefano
Besides, we propose a set of strategies to constrain the learned feature spaces, to ease learning and increase the generalization capability of the mapping network, thereby considerably improving the final performance of our framework.
no code implementations • 2 Dec 2022 • Enis Simsar, Alessio Tonioni, Evin Pınar Örnek, Federico Tombari
3D GANs have the ability to generate latent codes for entire 3D volumes rather than only 2D images.
no code implementations • 9 Nov 2022 • Diego Martin Arroyo, Alessio Tonioni, Federico Tombari
Current methods for image-to-image translation produce compelling results, however, the applied transformation is difficult to control, since existing mechanisms are often limited and non-intuitive.
1 code implementation • 23 Jun 2021 • Farid Yagubbayli, Yida Wang, Alessio Tonioni, Federico Tombari
Most modern deep learning-based multi-view 3D reconstruction techniques use RNNs or fusion modules to combine information from multiple images after independently encoding them.
no code implementations • 5 Feb 2021 • Pierluigi Zama Ramirez, Alessio Tonioni, Federico Tombari
Novel view synthesis from a single image aims at generating novel views from a single input image of an object.
no code implementations • 25 Nov 2020 • Mattia Segu, Alessio Tonioni, Federico Tombari
Several recent methods use multiple datasets to train models to extract domain-invariant features, hoping to generalize to unseen domains.
Ranked #63 on Domain Generalization on PACS
no code implementations • 17 Nov 2020 • Riccardo Spezialetti, David Joseph Tan, Alessio Tonioni, Keisuke Tateno, Federico Tombari
Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning.
1 code implementation • 10 Jul 2020 • Matteo Poggi, Alessio Tonioni, Fabio Tosi, Stefano Mattoccia, Luigi Di Stefano
Thus, our network architecture and adaptation algorithms realize the first real-time self-adaptive deep stereo system and pave the way for a new paradigm that can facilitate practical deployment of end-to-end architectures for dense disparity regression.
1 code implementation • 9 Sep 2019 • Alessio Tonioni, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano
Extensive experimental results based on standard datasets and evaluation protocols prove that our technique can address effectively the domain shift issue with both stereo and monocular depth prediction architectures and outperforms other state-of-the-art unsupervised loss functions that may be alternatively deployed to pursue domain adaptation.
1 code implementation • 5 Aug 2019 • Daniele De Gregorio, Alessio Tonioni, Gianluca Palli, Luigi Di Stefano
In this paper, we propose Augmented Reality Semi-automatic labeling (ARS), a semi-automatic method which leverages on moving a 2D camera by means of a robot, proving precise camera tracking, and an augmented reality pen to define initial object bounding box, to create large labeled datasets with minimal human intervention.
no code implementations • 17 Jul 2019 • Oscar Rahnama, Tommaso Cavallari, Stuart Golodetz, Alessio Tonioni, Thomas Joy, Luigi Di Stefano, Simon Walker, Philip H. S. Torr
Obtaining highly accurate depth from stereo images in real time has many applications across computer vision and robotics, but in some contexts, upper bounds on power consumption constrain the feasible hardware to embedded platforms such as FPGAs.
2 code implementations • ICCV 2019 • Pierluigi Zama Ramirez, Alessio Tonioni, Samuele Salti, Luigi Di Stefano
Recent works have proven that many relevant visual tasks are closely related one to another.
1 code implementation • CVPR 2019 • Alessio Tonioni, Oscar Rahnama, Thomas Joy, Luigi Di Stefano, Thalaiyasingam Ajanthan, Philip H. S. Torr
Real world applications of stereo depth estimation require models that are robust to dynamic variations in the environment.
no code implementations • 2 Feb 2019 • Alessio Tonioni, Luigi Di Stefano
Moreover, there exist a significant domain shift between the images that should be recognized at test time, taken in stores by cheap cameras, and those available for training, usually just one or a few studio-quality images per product.
no code implementations • 13 Oct 2018 • Pierluigi Zama Ramirez, Alessio Tonioni, Luigi Di Stefano
To prove the effectiveness of our proposal, we show how a semantic segmentation CNN trained on images from the synthetic GTA dataset adapted by our method can improve performance by more than 16% mIoU with respect to the same model trained on synthetic images.
1 code implementation • CVPR 2019 • Alessio Tonioni, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano
Deep convolutional neural networks trained end-to-end are the state-of-the-art methods to regress dense disparity maps from stereo pairs.
no code implementations • 3 Oct 2018 • Alessio Tonioni, Eugenio Serra, Luigi Di Stefano
Then, available product databases usually include just one or a few studio-quality images per product (referred to herein as reference images), whilst at test time recognition is performed on pictures displaying a portion of a shelf containing several products and taken in the store by cheap cameras (referred to as query images).
1 code implementation • ICCV 2017 • Alessio Tonioni, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano
Recent ground-breaking works have shown that deep neural networks can be trained end-to-end to regress dense disparity maps directly from image pairs.
no code implementations • 26 Jul 2017 • Alessio Tonioni, Luigi Di Stefano
The arrangement of products in store shelves is carefully planned to maximize sales and keep customers happy.