Search Results for author: Federico Tombari

Found 168 papers, 57 papers with code

BRAVE: Broadening the visual encoding of vision-language models

no code implementations10 Apr 2024 Oğuzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, Federico Tombari

Our results highlight the potential of incorporating different visual biases for a more broad and contextualized visual understanding of VLMs.

Hallucination Language Modelling +1

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

no code implementations5 Apr 2024 Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

This marks a significant advancement towards modeling photorealistic digital humans using physically based inverse rendering with physics in the loop.

Inverse Rendering

OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views

no code implementations4 Apr 2024 Francis Engelmann, Fabian Manhardt, Michael Niemeyer, Keisuke Tateno, Marc Pollefeys, Federico Tombari

Our OpenNeRF further leverages NeRF's ability to render novel views and extract open-set VLM features from areas that are not well observed in the initial posed images.

Image Segmentation Point Cloud Segmentation +2

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

1 code implementation4 Apr 2024 Rui Li, Tobias Fischer, Mattia Segu, Marc Pollefeys, Luc van Gool, Federico Tombari

We propose KYN, a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density.

3D Scene Reconstruction Depth Estimation +2

CLoRA: A Contrastive Approach to Compose Multiple LoRA Models

no code implementations28 Mar 2024 Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag

Low-Rank Adaptations (LoRAs) have emerged as a powerful and popular technique in the field of image generation, offering a highly effective way to adapt and refine pre-trained deep learning models for specific tasks without the need for comprehensive retraining.

Image Generation

RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS

no code implementations20 Mar 2024 Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Daniel Duckworth, Rama Gosula, Keisuke Tateno, John Bates, Dominik Kaeser, Federico Tombari

First, we use radiance fields as a prior and supervision signal for optimizing point-based scene representations, leading to improved quality and more robust optimization.

GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

no code implementations17 Mar 2024 Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari

During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces.

Novel View Synthesis

KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

1 code implementation15 Mar 2024 Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji

In this paper, we present KP-RED, a unified KeyPoint-driven REtrieval and Deformation framework that takes object scans as input and jointly retrieves and deforms the most geometrically similar CAD models from a pre-processed database to tightly match the target.

3D Shape Retrieval Retrieval

FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

no code implementations11 Mar 2024 Muhammad Saif Ullah Khan, Muhammad Ferjad Naeem, Federico Tombari, Luc van Gool, Didier Stricker, Muhammad Zeshan Afzal

We propose FocusCLIP, integrating subject-level guidance--a specialized mechanism for target-specific supervision--into the CLIP framework for improved zero-shot transfer on human-centric tasks.

Activity Recognition Age Classification +1

Denoising Diffusion via Image-Based Rendering

no code implementations5 Feb 2024 Titas Anciukevičius, Fabian Manhardt, Federico Tombari, Paul Henderson

In this work, we introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.

3D Reconstruction Denoising +1

Learning to Prompt with Text Only Supervision for Vision-Language Models

1 code implementation4 Jan 2024 Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc van Gool, Federico Tombari

While effective, most of these works require labeled data which is not practical, and often struggle to generalize towards new datasets due to over-fitting on the source data.

Prompt Engineering

Text-Conditioned Resampler For Long Form Video Understanding

no code implementations19 Dec 2023 Bruno Korbar, Yongqin Xian, Alessio Tonioni, Andrew Zisserman, Federico Tombari

In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task.

Language Modelling Large Language Model +2

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

no code implementations14 Dec 2023 Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation.

Denoising Semantic Segmentation +1

CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models

no code implementations11 Dec 2023 Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag

Images produced by text-to-image diffusion models might not always faithfully represent the semantic intent of the provided text prompt, where the model might overlook or entirely fail to produce certain objects.

Re-Nerfing: Improving Novel Views Synthesis through Novel Views Synthesis

no code implementations4 Dec 2023 Felix Tristram, Stefano Gasperini, Nassir Navab, Federico Tombari

With Re-Nerfing, we enhance the geometric consistency of novel views as follows: First, we train a NeRF with the available views.

Data Augmentation Novel View Synthesis

DNS SLAM: Dense Neural Semantic-Informed SLAM

no code implementations30 Nov 2023 Kunyi Li, Michael Niemeyer, Nassir Navab, Federico Tombari

In this work, we introduce DNS SLAM, a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.

Semantic SLAM

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

1 code implementation27 Nov 2023 Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc van Gool, Federico Tombari

In SemiVL, we propose to integrate rich priors from VLM pre-training into semi-supervised semantic segmentation to learn better semantic decision boundaries.

Segmentation Semi-Supervised Semantic Segmentation

D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction

no code implementations23 Nov 2023 Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari

Second, we introduce a dual-stream denoiser to semantically and geometrically model hand-object interactions with a novel unified hand-object semantic embedding, enhancing the reconstruction performance of the hand-occluded region of the object.

Denoising Object +1

3D Compression Using Neural Fields

no code implementations21 Nov 2023 Janis Postels, Yannick Strümpler, Klara Reichard, Luc van Gool, Federico Tombari

Neural Fields (NFs) have gained momentum as a tool for compressing various data modalities - e. g. images and videos.

Attribute

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

1 code implementation18 Nov 2023 Yamei Chen, Yan Di, Guangyao Zhai, Fabian Manhardt, Chenyangguang Zhang, Ruida Zhang, Federico Tombari, Nassir Navab, Benjamin Busam

Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information.

Object Pose Estimation

SILC: Improving Vision Language Pretraining with Self-Distillation

no code implementations20 Oct 2023 Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc van Gool, Federico Tombari

However, the contrastive objective used by these models only focuses on image-text alignment and does not incentivise image feature learning for dense prediction tasks.

Classification Contrastive Learning +8

MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision

no code implementations18 Oct 2023 Chenyangguang Zhang, Guanlong Jiao, Yan Di, Gu Wang, Ziqin Huang, Ruida Zhang, Fabian Manhardt, Bowen Fu, Federico Tombari, Xiangyang Ji

Previous works concerning single-view hand-held object reconstruction typically rely on supervision from 3D ground-truth models, which are hard to collect in real world.

Object Object Reconstruction

SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs

no code implementations21 Sep 2023 Guangyao Zhai, Xiaoni Cai, Dianye Huang, Yan Di, Fabian Manhardt, Federico Tombari, Nassir Navab, Benjamin Busam

In this paper, we present SG-Bot, a novel rearrangement framework that utilizes a coarse-to-fine scheme with a scene graph as the scene representation.

Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction

no code implementations ICCV 2023 Zhiying Leng, Shun-Cheng Wu, Mahdi Saleh, Antonio Montanaro, Hao Yu, Yin Wang, Nassir Navab, Xiaohui Liang, Federico Tombari

In this work, we propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet), which leverages intrinsic properties of hyperbolic space to learn representative features.

Object Object Reconstruction

Introducing Language Guidance in Prompt-based Continual Learning

1 code implementation ICCV 2023 Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc van Gool, Didier Stricker, Federico Tombari, Muhammad Zeshan Afzal

While the model faces a disjoint set of classes in each task in this setting, we argue that these classes can be encoded to the same embedding space of a pre-trained language encoder.

Continual Learning

3D Adversarial Augmentations for Robust Out-of-Domain Predictions

no code implementations29 Aug 2023 Alexander Lehner, Stefano Gasperini, Alvaro Marcos-Ramiro, Michael Schmidt, Nassir Navab, Benjamin Busam, Federico Tombari

We conduct extensive experiments across a variety of scenarios on data from KITTI, Waymo, and CrashD for 3D object detection, and on data from SemanticKITTI, Waymo, and nuScenes for 3D semantic segmentation.

3D Object Detection 3D Semantic Segmentation +2

Robust Monocular Depth Estimation under Challenging Conditions

no code implementations ICCV 2023 Stefano Gasperini, Nils Morbitzer, HyunJun Jung, Nassir Navab, Federico Tombari

While state-of-the-art monocular depth estimation approaches achieve impressive results in ideal settings, they are highly unreliable under challenging illumination and weather conditions, such as at nighttime or in the presence of rain.

Monocular Depth Estimation valid

CCD-3DR: Consistent Conditioning in Diffusion for Single-Image 3D Reconstruction

no code implementations15 Aug 2023 Yan Di, Chenyangguang Zhang, Pengyuan Wang, Guangyao Zhai, Ruida Zhang, Fabian Manhardt, Benjamin Busam, Xiangyang Ji, Federico Tombari

However, such strategies fail to consistently align the denoised point cloud with the given image, leading to unstable conditioning and inferior performance.

3D Reconstruction

U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds

1 code implementation ICCV 2023 Yan Di, Chenyangguang Zhang, Ruida Zhang, Fabian Manhardt, Yongzhi Su, Jason Rambach, Didier Stricker, Xiangyang Ji, Federico Tombari

In this paper, we propose U-RED, an Unsupervised shape REtrieval and Deformation pipeline that takes an arbitrary object observation as input, typically captured by RGB images or scans, and jointly retrieves and deforms the geometrically similar CAD models from a pre-established database to tightly match the target.

3D Shape Retrieval Retrieval

View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

no code implementations29 May 2023 Issa Mouawad, Nikolas Brasch, Fabian Manhardt, Federico Tombari, Francesca Odone

For autonomous vehicles, driving safely is highly dependent on the capability to correctly perceive the environment in 3D space, hence the task of 3D object detection represents a fundamental aspect of perception.

3D Object Detection Autonomous Vehicles +1

Incremental 3D Semantic Scene Graph Prediction from RGB Sequences

no code implementations CVPR 2023 Shun-Cheng Wu, Keisuke Tateno, Nassir Navab, Federico Tombari

Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.

TextMesh: Generation of Realistic 3D Meshes From Text Prompts

1 code implementation24 Apr 2023 Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, Michael Niemeyer, Federico Tombari

In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.

NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM

no code implementations23 Mar 2023 Hidenobu Matsuki, Keisuke Tateno, Michael Niemeyer, Federico Tombari

However, in real-time and on-the-fly scene capture applications, this prior knowledge cannot be assumed as fixed or static, since it dynamically changes and it is subject to significant updates based on run-time observations.

NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

no code implementations16 Mar 2023 Marie-Julie Rakotosaona, Fabian Manhardt, Diego Martin Arroyo, Michael Niemeyer, Abhijit Kundu, Federico Tombari

Obtaining 3D meshes from neural radiance fields still remains an open challenge since NeRFs are optimized for view synthesis, not enforcing an accurate underlying geometry on the radiance field.

Novel View Synthesis Surface Reconstruction

Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs

no code implementations15 Mar 2023 Artem Savkin, Rachid Ellouze, Nassir Navab, Federico Tombari

Image synthesis driven by computer graphics achieved recently a remarkable realism, yet synthetic image data generated this way reveals a significant domain gap with respect to real-world data.

Autonomous Driving Image Generation +1

SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor Environments

1 code implementation22 Dec 2022 Evin Pınar Örnek, Aravindhan K Krishnan, Shreekant Gayaka, Cheng-Hao Kuo, Arnie Sen, Nassir Navab, Federico Tombari

We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the ``objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments.

Instance Segmentation Object +2

LatentSwap3D: Semantic Edits on 3D Image GANs

no code implementations2 Dec 2022 Enis Simsar, Alessio Tonioni, Evin Pınar Örnek, Federico Tombari

3D GANs have the ability to generate latent codes for entire 3D volumes rather than only 2D images.

Feature Importance

Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion

1 code implementation CVPR 2023 Dario Pavllo, David Joseph Tan, Marie-Julie Rakotosaona, Federico Tombari

Neural Radiance Fields (NeRF) coupled with GANs represent a promising direction in the area of 3D reconstruction from a single view, owing to their ability to efficiently model arbitrary topologies.

3D Reconstruction Pose Estimation

SPARF: Neural Radiance Fields from Sparse and Noisy Poses

1 code implementation CVPR 2023 Prune Truong, Marie-Julie Rakotosaona, Fabian Manhardt, Federico Tombari

Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views.

Novel View Synthesis

DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation

no code implementations10 Nov 2022 Azade Farshad, Yousef Yeganeh, Helisa Dhamo, Federico Tombari, Nassir Navab

Graph representation of objects and their relations in a scene, known as a scene graph, provides a precise and discernible interface to manipulate a scene by modifying the nodes or the edges in the graph.

Disentanglement Image Manipulation

ParGAN: Learning Real Parametrizable Transformations

no code implementations9 Nov 2022 Diego Martin Arroyo, Alessio Tonioni, Federico Tombari

Current methods for image-to-image translation produce compelling results, however, the applied transformation is difficult to control, since existing mechanisms are often limited and non-intuitive.

Image-to-Image Translation Translation

OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection

no code implementations2 Nov 2022 Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach, Benjamin Busam, Didier Stricker, Federico Tombari

Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box.

Monocular 3D Object Detection Object +1

I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

no code implementations21 Sep 2022 Muhammad Ferjad Naeem, Yongqin Xian, Luc van Gool, Federico Tombari

In order to distill discriminative visual words from noisy documents, we introduce a new cross-modal attention module that learns fine-grained interactions between image patches and document words.

Generalized Zero-Shot Learning Image Classification +2

Segmenting Known Objects and Unseen Unknowns without Prior Knowledge

no code implementations ICCV 2023 Stefano Gasperini, Alvaro Marcos-Ramiro, Michael Schmidt, Nassir Navab, Benjamin Busam, Federico Tombari

By doing so, for the first time in panoptic segmentation with unknown objects, our U3HS is trained without unknown categories, reducing assumptions and leaving the settings as unconstrained as in real-life scenarios.

Panoptic Segmentation Scene Understanding +1

ManiFlow: Implicitly Representing Manifolds with Normalizing Flows

no code implementations18 Aug 2022 Janis Postels, Martin Danelljan, Luc van Gool, Federico Tombari

In contrast to prior work, we approach this problem by generating samples from the original data distribution given full knowledge about the perturbed distribution and the noise model.

Surface Reconstruction

SC-Explorer: Incremental 3D Scene Completion for Safe and Efficient Exploration Mapping and Planning

1 code implementation17 Aug 2022 Lukas Schmid, Mansoor Nasir Cheema, Victor Reijgwart, Roland Siegwart, Federico Tombari, Cesar Cadena

We further present an informative path planning method, leveraging the capabilities of our mapping approach and a novel scene-completion-aware information gain.

Efficient Exploration

SSP-Pose: Symmetry-Aware Shape Prior Deformation for Direct Category-Level Object Pose Estimation

no code implementations13 Aug 2022 Ruida Zhang, Yan Di, Fabian Manhardt, Federico Tombari, Xiangyang Ji

In this paper, to handle these shortcomings, we propose an end-to-end trainable network SSP-Pose for category-level pose estimation, which integrates shape priors into a direct pose regression network.

Pose Estimation regression

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning

no code implementations31 Jul 2022 Mahdi Saleh, Yige Wang, Nassir Navab, Benjamin Busam, Federico Tombari

The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods while requiring significantly fewer computations.

Scene Segmentation Segmentation

RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation

1 code implementation30 Jul 2022 Ruida Zhang, Yan Di, Zhiqiang Lou, Fabian Manhardt, Federico Tombari, Xiangyang Ji

Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories.

Object Pose Estimation

E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs

no code implementations20 Jul 2022 Yanyan Li, Federico Tombari

Minimal solutions for relative rotation and translation estimation tasks have been explored in different scenarios, typically relying on the so-called co-visibility graph.

Visual Odometry

GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning

1 code implementation20 Jul 2022 Huseyin Coskun, Alireza Zareian, Joshua L. Moore, Federico Tombari, Chen Wang

Specifically, we outperform the state of the art by 7% on UCF and 4% on HMDB for video retrieval, and 5% on UCF and 6% on HMDB for video classification

Action Recognition Clustering +6

4D-OR: Semantic Scene Graphs for OR Domain Modeling

1 code implementation22 Mar 2022 Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Tobias Czempiel, Federico Tombari, Nassir Navab

Towards this goal, for the first time, we propose using semantic scene graphs (SSG) to describe and summarize the surgical scene.

Scene Graph Generation

From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction

no code implementations15 Mar 2022 Evin Pınar Örnek, Shristi Mudgal, Johanna Wald, Yida Wang, Nassir Navab, Federico Tombari

There have been numerous recently proposed methods for monocular depth prediction (MDP) coupled with the equally rapid evolution of benchmarking tools.

Benchmarking Depth Estimation +1

GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting

3 code implementations CVPR 2022 Yan Di, Ruida Zhang, Zhiqiang Lou, Fabian Manhardt, Xiangyang Ji, Nassir Navab, Federico Tombari

While 6D object pose estimation has recently made a huge leap forward, most methods can still only handle a single or a handful of different objects, which limits their applications.

 Ranked #1 on 6D Pose Estimation on LineMOD (Mean ADD-S metric)

6D Pose Estimation 6D Pose Estimation using RGB +3

Transformers in Action: Weakly Supervised Action Segmentation

no code implementations14 Jan 2022 John Ridley, Huseyin Coskun, David Joseph Tan, Nassir Navab, Federico Tombari

The video action segmentation task is regularly explored under weaker forms of supervision, such as transcript supervision, where a list of actions is easier to obtain than dense frame-wise labels.

Action Segmentation

Implicit Neural Representations for Image Compression

no code implementations8 Dec 2021 Yannick Strümpler, Janis Postels, Ren Yang, Luc van Gool, Federico Tombari

Recently Implicit Neural Representations (INRs) gained attention as a novel and effective representation for various data types.

Image Compression Quantization

Object-aware Monocular Depth Prediction with Instance Convolutions

1 code implementation2 Dec 2021 Enis Simsar, Evin Pınar Örnek, Fabian Manhardt, Helisa Dhamo, Nassir Navab, Federico Tombari

With the advent of deep learning, estimating depth from a single RGB image has recently received a lot of attention, being capable of empowering many different applications ranging from path planning for robotics to computational cinematography.

Depth Estimation Depth Prediction +2

Neural Fields in Visual Computing and Beyond

1 code implementation22 Nov 2021 Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, Srinath Sridhar

Recent advances in machine learning have created increasing interest in solving visual computing problems using a class of coordinate-based neural networks that parametrize physical properties of scenes or objects across space and time.

3D Reconstruction Image Animation +1

Semantic Image Alignment for Vehicle Localization

no code implementations8 Oct 2021 Markus Herb, Matthias Lemberger, Marcel M. Schmitt, Alexander Kurz, Tobias Weiherer, Nassir Navab, Federico Tombari

Accurate and reliable localization is a fundamental requirement for autonomous vehicles to use map information in higher-level tasks such as navigation or planning.

Autonomous Vehicles Semantic Segmentation +1

Semantic Dense Reconstruction with Consistent Scene Segments

no code implementations30 Sep 2021 Yingcai Wan, Yanyan Li, Yingxuan You, Cheng Guo, Lijin Fang, Federico Tombari

In this paper, a method for dense semantic 3D scene reconstruction from an RGB-D sequence is proposed to solve high-level scene understanding tasks.

3D Scene Reconstruction Scene Understanding +1

Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation

no code implementations24 Sep 2021 Mert Asim Karaoglu, Nikolas Brasch, Marijn Stollenga, Wolfgang Wein, Nassir Navab, Federico Tombari, Alexander Ladikos

The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin and can be employed in 3D reconstruction pipelines.

3D Reconstruction Depth Estimation

Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs

1 code implementation ICCV 2021 Helisa Dhamo, Fabian Manhardt, Nassir Navab, Federico Tombari

Scene graphs are representations of a scene, composed of objects (nodes) and inter-object relationships (edges), proven to be particularly suited for this task, as they allow for semantic control on the generated content.

Object

SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

2 code implementations ICCV 2021 Yan Di, Fabian Manhardt, Gu Wang, Xiangyang Ji, Nassir Navab, Federico Tombari

Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (e. g. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem.

6D Pose Estimation 6D Pose Estimation using RGB +1

Unconditional Scene Graph Generation

no code implementations ICCV 2021 Sarthak Garg, Helisa Dhamo, Azade Farshad, Sabrina Musatian, Nassir Navab, Federico Tombari

Scene graphs, composed of nodes as objects and directed-edges as relationships among objects, offer an alternative representation of a scene that is more semantically grounded than images.

Anomaly Detection Graph Generation +3

R4Dyn: Exploring Radar for Self-Supervised Monocular Depth Estimation of Dynamic Scenes

no code implementations10 Aug 2021 Stefano Gasperini, Patrick Koch, Vinzenz Dallabetta, Nassir Navab, Benjamin Busam, Federico Tombari

While self-supervised monocular depth estimation in driving scenarios has achieved comparable performance to supervised approaches, violations of the static world assumption can still lead to erroneous depth predictions of traffic participants, posing a potential safety issue.

Autonomous Vehicles Monocular Depth Estimation

Attention-based Adversarial Appearance Learning of Augmented Pedestrians

no code implementations6 Jul 2021 Kevin Strauss, Artem Savkin, Federico Tombari

Synthetic data became already an essential component of machine learning-based perception in the field of autonomous driving.

Autonomous Driving

On the Practicality of Deterministic Epistemic Uncertainty

2 code implementations1 Jul 2021 Janis Postels, Mattia Segu, Tao Sun, Luca Sieber, Luc van Gool, Fisher Yu, Federico Tombari

We find that, while DUMs scale to realistic vision tasks and perform well on OOD detection, the practicality of current methods is undermined by poor calibration under distributional shifts.

Out of Distribution (OOD) Detection Semantic Segmentation +1

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction

1 code implementation23 Jun 2021 Farid Yagubbayli, Yida Wang, Alessio Tonioni, Federico Tombari

Most modern deep learning-based multi-view 3D reconstruction techniques use RNNs or fusion modules to combine information from multiple images after independently encoding them.

3D Reconstruction Multi-View 3D Reconstruction +1

Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical Procedures

no code implementations9 Jun 2021 Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Federico Tombari, Nassir Navab

We then use MSSG to introduce a dynamically generated graphical user interface tool for surgical procedure analysis which could be used for many applications including process optimization, OR design and automatic report generation.

Go with the Flows: Mixtures of Normalizing Flows for Point Cloud Generation and Reconstruction

no code implementations6 Jun 2021 Janis Postels, Mengya Liu, Riccardo Spezialetti, Luc van Gool, Federico Tombari

Recently normalizing flows (NFs) have demonstrated state-of-the-art performance on modeling 3D point clouds while allowing sampling with arbitrary resolution at inference time.

Data Augmentation Point Cloud Generation

SRH-Net: Stacked Recurrent Hourglass Network for Stereo Matching

1 code implementation25 May 2021 Hongzhi Du, Yanyan Li, Yanbiao Sun, Jigui Zhu, Federico Tombari

The cost aggregation strategy shows a crucial role in learning-based stereo matching tasks, where 3D convolutional filters obtain state of the art but require intensive computation resources, while 2D operations need less GPU memory but are sensitive to domain shift.

Stereo Matching

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

1 code implementation16 May 2021 Margarita Grinvald, Federico Tombari, Roland Siegwart, Juan Nieto

The ability to simultaneously track and reconstruct multiple objects moving in the scene is of the utmost importance for robotic tasks such as autonomous navigation and interaction.

Autonomous Navigation Object +2

Variational Transformer Networks for Layout Generation

no code implementations CVPR 2021 Diego Martin Arroyo, Janis Postels, Federico Tombari

Generative models able to synthesize layouts of different kinds (e. g. documents, user interfaces or furniture arrangements) are a useful tool to aid design processes and as a first step in the generation of synthetic data, among other tasks.

ManhattanSLAM: Robust Planar Tracking and Mapping Leveraging Mixture of Manhattan Frames

1 code implementation28 Mar 2021 Raza Yunus, Yanyan Li, Federico Tombari

In this paper, a robust RGB-D SLAM system is proposed to utilize the structural information in indoor scenes, allowing for accurate tracking and efficient dense mapping on a CPU.

Pose Estimation Superpixels

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

1 code implementation CVPR 2021 Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji

In this work, we perform an in-depth investigation on both direct and indirect methods, and propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations.

6D Pose Estimation 6D Pose Estimation using RGB +1

Unsupervised Novel View Synthesis from a Single Image

no code implementations5 Feb 2021 Pierluigi Zama Ramirez, Alessio Tonioni, Federico Tombari

Novel view synthesis from a single image aims at generating novel views from a single input image of an object.

Novel View Synthesis

Learning Graph Embeddings for Compositional Zero-shot Learning

1 code implementation CVPR 2021 Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata

In compositional zero-shot learning, the goal is to recognize unseen compositions (e. g. old dog) of observed visual primitives states (e. g. old, cute) and objects (e. g. car, dog) in the training set.

Compositional Zero-Shot Learning Graph Embedding +1

The Hidden Uncertainty in a Neural Networks Activations

no code implementations5 Dec 2020 Janis Postels, Hermann Blum, Yannick Strümpler, Cesar Cadena, Roland Siegwart, Luc van Gool, Federico Tombari

We find that this leads to improved OOD detection of epistemic uncertainty at the cost of ambiguous calibration close to the data distribution.

Density Estimation Out of Distribution (OOD) Detection

3DSNet: Unsupervised Shape-to-Shape 3D Style Transfer

1 code implementation26 Nov 2020 Mattia Segu, Margarita Grinvald, Roland Siegwart, Federico Tombari

Transferring the style from one image onto another is a popular and widely studied task in computer vision.

Style Transfer

Batch Normalization Embeddings for Deep Domain Generalization

no code implementations25 Nov 2020 Mattia Segu, Alessio Tonioni, Federico Tombari

Several recent methods use multiple datasets to train models to extract domain-invariant features, hoping to generalize to unseen domains.

Domain Generalization

Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis

no code implementations ECCV 2020 Ruixuan Yu, Xin Wei, Federico Tombari, Jian Sun

In this work, we propose a novel deep network for point clouds by incorporating positional information of points as inputs while yielding rotation-invariance.

A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views

no code implementations17 Nov 2020 Riccardo Spezialetti, David Joseph Tan, Alessio Tonioni, Keisuke Tateno, Federico Tombari

Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning.

3D Shape Reconstruction Object +1

Panoster: End-to-end Panoptic Segmentation of LiDAR Point Clouds

no code implementations28 Oct 2020 Stefano Gasperini, Mohammad-Ali Nikouei Mahani, Alvaro Marcos-Ramiro, Nassir Navab, Federico Tombari

Panoptic segmentation has recently unified semantic and instance segmentation, previously addressed separately, thus taking a step further towards creating more comprehensive and efficient perception systems.

Clustering Instance Segmentation +2

SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion

2 code implementations26 Oct 2020 Shun-Cheng Wu, Keisuke Tateno, Nassir Navab, Federico Tombari

We propose a framework that ameliorates this issue by performing scene reconstruction and semantic scene completion jointly in an incremental and real-time manner, based on an input sequence of depth maps.

3D Semantic Scene Completion

RGB-D SLAM with Structural Regularities

1 code implementation15 Oct 2020 Yanyan Li, Raza Yunus, Nikolas Brasch, Nassir Navab, Federico Tombari

This work proposes a RGB-D SLAM system specifically designed for structured environments and aimed at improved tracking and mapping accuracy by relying on geometric features that are extracted from the surrounding.

Robotics

Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes

1 code implementation ECCV 2020 Johanna Wald, Torsten Sattler, Stuart Golodetz, Tommaso Cavallari, Federico Tombari

In this paper, we adapt 3RScan - a recently introduced indoor RGB-D dataset designed for object instance re-localization - to create RIO10, a new long-term camera re-localization benchmark focused on indoor scenes.

Camera Relocalization

Structure-SLAM: Low-Drift Monocular SLAM in Indoor Environments

1 code implementation5 Aug 2020 Yanyan Li, Nikolas Brasch, Yida Wang, Nassir Navab, Federico Tombari

In this paper a low-drift monocular SLAM method is proposed targeting indoor scenarios, where monocular SLAM often fails due to the lack of textured surfaces.

Robotics

Explicit Domain Adaptation with Loosely Coupled Samples

no code implementations24 Apr 2020 Oliver Scheel, Loren Schwarz, Nassir Navab, Federico Tombari

In this work we propose a transfer learning framework, core of which is learning an explicit mapping between domains.

Autonomous Driving Domain Adaptation +4

Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions

no code implementations CVPR 2020 Johanna Wald, Helisa Dhamo, Nassir Navab, Federico Tombari

In our work we focus on scene graphs, a data structure that organizes the entities of a scene in a graph, where objects are nodes and their relationships modeled as edges.

3d scene graph generation 3D Semantic Segmentation +2

Semantic Image Manipulation Using Scene Graphs

1 code implementation CVPR 2020 Helisa Dhamo, Azade Farshad, Iro Laina, Nassir Navab, Gregory D. Hager, Federico Tombari, Christian Rupprecht

In our work, we address the novel problem of image manipulation from scene graphs, in which a user can edit images by merely applying changes in the nodes or edges of a semantic graph that is generated from the image.

Image Inpainting Image Manipulation +1

Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models

no code implementations10 Mar 2020 Alessandro Berlati, Oliver Scheel, Luigi Di Stefano, Federico Tombari

Ambiguity is inherently present in many machine learning tasks, but especially for sequential models seldom accounted for, as most only output a single prediction.

Time Series Time Series Analysis +1

Restricting the Flow: Information Bottlenecks for Attribution

4 code implementations ICLR 2020 Karl Schulz, Leon Sixt, Federico Tombari, Tim Landgraf

Attribution methods provide insights into the decision-making of machine learning models like artificial neural networks.

Decision Making

Quaternion Equivariant Capsule Networks for 3D Point Clouds

2 code implementations ECCV 2020 Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, Federico Tombari

We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points.

Pose Estimation

ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image

no code implementations ICCV 2019 Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space.

Ranked #7 on 3D Semantic Scene Completion on NYUv2 (using extra training data)

3D Semantic Scene Completion Attribute

Object-Driven Multi-Layer Scene Decomposition From a Single Image

no code implementations ICCV 2019 Helisa Dhamo, Nassir Navab, Federico Tombari

Our approach aims at building up a Layered Depth Image (LDI) from a single RGB input, which is an efficient representation that arranges the scene in layers, including originally occluded regions.

Hallucination

RIO: 3D Object Instance Re-Localization in Changing Indoor Environments

1 code implementation ICCV 2019 Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, Matthias Nießner

In this work, we introduce the task of 3D object instance re-localization (RIO): given one or multiple objects in an RGB-D scan, we want to estimate their corresponding 6DoF poses in another 3D scan of the same environment taken at a later point in time.

Object Scene Understanding

Query-guided End-to-End Person Search

1 code implementation CVPR 2019 Bharti Munjal, Sikandar Amin, Federico Tombari, Fabio Galasso

We extend this with i. a query-guided Siamese squeeze-and-excitation network (QSSE-Net) that uses global context from both the query and gallery images, ii.

Human Detection Person Search +1

Attention-based Lane Change Prediction

no code implementations4 Mar 2019 Oliver Scheel, Naveen Shankar Nagaraja, Loren Schwarz, Nassir Navab, Federico Tombari

Lane change prediction of surrounding vehicles is a key building block of path planning.

3D Point Capsule Networks

2 code implementations CVPR 2019 Yongheng Zhao, Tolga Birdal, Haowen Deng, Federico Tombari

In this paper, we propose 3D point-capsule networks, an auto-encoder designed to process sparse 3D point clouds while preserving spatial arrangements of the input data.

3D Feature Matching 3D Geometry Perception +8

Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data

no code implementations ICCV 2019 Fabian Manhardt, Diego Martin Arroyo, Christian Rupprecht, Benjamin Busam, Tolga Birdal, Nassir Navab, Federico Tombari

For each object instance we predict multiple pose and class outcomes to estimate the specific pose distribution generated by symmetries and repetitive textures.

3D Object Detection Object +3

Dealing with Ambiguity in Robotic Grasping via Multiple Predictions

no code implementations2 Nov 2018 Ghazal Ghazaei, Iro Laina, Christian Rupprecht, Federico Tombari, Nassir Navab, Kianoush Nazarpour

Further, we reformulate the problem of robotic grasping by replacing conventional grasp rectangles with grasp belief maps, which hold more precise location information than a rectangle and account for the uncertainty inherent to the task.

Robotic Grasping

Adversarial Semantic Scene Completion from a Single Depth Image

no code implementations25 Oct 2018 Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

We propose a method to reconstruct, complete and semantically label a 3D scene from a single input depth image.

Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images

no code implementations ECCV 2018 Keisuke Tateno, Nassir Navab, Federico Tombari

There is a high demand of 3D data for 360° panoramic images and videos, pushed by the growing availability on the market of specialized hardware for both capturing (e. g., omnidirectional cameras) as well as visualizing in 3D (e. g., head mounted displays) panoramic images and videos.

Depth Estimation Semantic Segmentation +1

Fully-Convolutional Point Networks for Large-Scale Point Clouds

1 code implementation ECCV 2018 Dario Rethage, Johanna Wald, Jürgen Sturm, Nassir Navab, Federico Tombari

This work proposes a general-purpose, fully-convolutional network architecture for efficiently processing large-scale 3D data.

Semantic Segmentation

Human Motion Analysis with Deep Metric Learning

2 code implementations ECCV 2018 Huseyin Coskun, David Joseph Tan, Sailesh Conjeti, Nassir Navab, Federico Tombari

Nevertheless, we believe that traditional approaches such as L2 distance or Dynamic Time Warping based on hand-crafted local pose metrics fail to appropriately capture the semantic relationship across motions and, as such, are not suitable for being employed as metrics within these tasks.

Dynamic Time Warping Metric Learning +1

Peeking Behind Objects: Layered Depth Prediction from a Single Image

no code implementations23 Jul 2018 Helisa Dhamo, Keisuke Tateno, Iro Laina, Nassir Navab, Federico Tombari

While conventional depth estimation can infer the geometry of a scene from a single RGB image, it fails to estimate scene regions that are occluded by foreground objects.

Depth Estimation Depth Prediction

Situation Assessment for Planning Lane Changes: Combining Recurrent Models and Prediction

no code implementations17 May 2018 Oliver Scheel, Loren Schwarz, Nassir Navab, Federico Tombari

One of the greatest challenges towards fully autonomous cars is the understanding of complex and dynamic scenes.

Webly Supervised Learning for Skin Lesion Classification

no code implementations31 Mar 2018 Fernando Navarro, Sailesh Conjeti, Federico Tombari, Nassir Navab

Within medical imaging, manual curation of sufficient well-labeled samples is cost, time and scale-prohibitive.

Classification General Classification +4

Guide Me: Interacting with Deep Networks

no code implementations CVPR 2018 Christian Rupprecht, Iro Laina, Nassir Navab, Gregory D. Hager, Federico Tombari

Interaction and collaboration between humans and intelligent machines has become increasingly important as machine learning methods move into real-world applications that involve end users.

Image Captioning Image Generation

Fast and Accurate Semantic Mapping through Geometric-based Incremental Segmentation

no code implementations7 Mar 2018 Yoshikatsu Nakajima, Keisuke Tateno, Federico Tombari, Hideo Saito

We propose an efficient and scalable method for incrementally building a dense, semantically annotated 3D map in real-time.

Computational Efficiency Segmentation

Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization

no code implementations ICCV 2017 Huseyin Coskun, Felix Achilles, Robert DiPietro, Nassir Navab, Federico Tombari

One-shot pose estimation for tasks such as body joint localization, camera pose estimation, and object tracking are generally noisy, and temporal filters have been extensively used for regularization.

Object Tracking Pose Estimation

6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality

no code implementations5 Sep 2017 David Joseph Tan, Nassir Navab, Federico Tombari

To determine the 3D orientation and 3D location of objects in the surroundings of a camera mounted on a robot or mobile device, we developed two powerful algorithms in object detection and temporal tracking that are combined seamlessly for robotic perception and interaction as well as Augmented Reality (AR).

6D Pose Estimation using RGB Object +2

Long Short-Term Memory Kalman Filters:Recurrent Neural Estimators for Pose Regularization

no code implementations6 Aug 2017 Huseyin Coskun, Felix Achilles, Robert DiPietro, Nassir Navab, Federico Tombari

One-shot pose estimation for tasks such as body joint localization, camera pose estimation, and object tracking are generally noisy, and temporal filters have been extensively used for regularization.

Object Tracking Pose Estimation

CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction

1 code implementation CVPR 2017 Keisuke Tateno, Federico Tombari, Iro Laina, Nassir Navab

Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for accurate and dense monocular reconstruction.

Depth Estimation Depth Prediction +1

An Octree-Based Approach towards Efficient Variational Range Data Fusion

no code implementations26 Aug 2016 Wadim Kehl, Tobias Holl, Federico Tombari, Slobodan Ilic, Nassir Navab

Volume-based reconstruction is usually expensive both in terms of memory consumption and runtime.

A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks

no code implementations24 Jun 2016 Felix Grün, Christian Rupprecht, Nassir Navab, Federico Tombari

Over the last decade, Convolutional Neural Networks (CNN) saw a tremendous surge in performance.

A Versatile Learning-Based 3D Temporal Tracker: Scalable, Robust, Online

no code implementations ICCV 2015 David Joseph Tan, Federico Tombari, Slobodan Ilic, Nassir Navab

This paper proposes a temporal tracking algorithm based on Random Forest that uses depth images to estimate and track the 3D pose of a rigid object in real-time.

Occlusion Handling

Learning a Descriptor-Specific 3D Keypoint Detector

no code implementations ICCV 2015 Samuele Salti, Federico Tombari, Riccardo Spezialetti, Luigi Di Stefano

Keypoint detection represents the first stage in the majority of modern computer vision pipelines based on automatically established correspondences between local descriptors.

Binary Classification Keypoint Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.