no code implementations • 30 Jan 2025 • Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich, Rares Ambrus
Our method uses raymap conditioning to both augment visual features with spatial information from different viewpoints, as well as to guide the generation of images and depth maps from novel views.
no code implementations • 6 Dec 2024 • Keunwoo Peter Yu, Achal Dave, Rares Ambrus, Jean Mercat
Through extensive evaluations, we show that spatial and temporal compression in Espresso each have a positive impact on the long-form video understanding capabilities; when combined, their positive impact increases.
no code implementations • 11 Nov 2024 • Yinshuang Xu, Dian Chen, Katherine Liu, Sergey Zakharov, Rares Ambrus, Kostas Daniilidis, Vitor Guizilini
Incorporating inductive bias by embedding geometric entities (such as rays) as input has proven successful in multi-view learning.
1 code implementation • 26 Oct 2024 • Muhammad Zubair Irshad, Mauro Comi, Yen-Chen Lin, Nick Heppert, Abhinav Valada, Rares Ambrus, Zsolt Kira, Jonathan Tremblay
Neural Fields have emerged as a transformative approach for 3D scene representation in computer vision and robotics, enabling accurate inference of geometry, 3D semantics, and dynamics from posed 2D data.
no code implementations • 15 Sep 2024 • Vitor Guizilini, Pavel Tokmakov, Achal Dave, Rares Ambrus
3D reconstruction from a single image is a long-standing problem in computer vision.
Ranked #5 on
Monocular Depth Estimation
on NYU-Depth V2
(using extra training data)
no code implementations • 4 Sep 2024 • Arkadeep Narayan Chaudhury, Igor Vasiljevic, Sergey Zakharov, Vitor Guizilini, Rares Ambrus, Srinivasa Narasimhan, Christopher G. Atkeson
Synthesizing accurate geometry and photo-realistic appearance of small scenes is an active area of research with compelling use cases in gaming, virtual reality, robotic-manipulation, autonomous driving, convenient product capture, and consumer-level photography.
no code implementations • 6 Jun 2024 • Sergey Zakharov, Katherine Liu, Adrien Gaidon, Rares Ambrus
The common trade-offs of state-of-the-art methods for multi-shape representation (a single model "packing" multiple objects) involve trading modeling accuracy against memory and storage.
no code implementations • 30 Apr 2024 • Jiading Fang, Xiangshan Tan, Shengjie Lin, Igor Vasiljevic, Vitor Guizilini, Hongyuan Mei, Rares Ambrus, Gregory Shakhnarovich, Matthew R Walter
We introduce Transcrib3D, an approach that brings together 3D detection methods and the emergent reasoning capabilities of large language models (LLMs).
1 code implementation • 1 Apr 2024 • Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini, Adrien Gaidon, Zsolt Kira, Rares Ambrus
Given the capabilities of neural fields in densely representing a 3D scene from 2D images, we ask the question: Can we scale their self-supervised pretraining, specifically using masked autoencoders, to generate effective 3D representations from posed RGB images.
no code implementations • 21 Mar 2024 • Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rares Ambrus, Sergey Zakharov
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image.
1 code implementation • 20 Feb 2024 • Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad, Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwaki
This paper addresses the challenging problem of category-level pose estimation.
no code implementations • CVPR 2024 • Matthew Kowal, Achal Dave, Rares Ambrus, Adrien Gaidon, Konstantinos G. Derpanis, Pavel Tokmakov
Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered.
1 code implementation • 19 Oct 2023 • Mayank Lunayach, Sergey Zakharov, Dian Chen, Rares Ambrus, Zsolt Kira, Muhammad Zubair Irshad
In this work, we address the challenging task of 3D object recognition without the reliance on real-world 3D labeled data.
no code implementations • 4 Oct 2023 • Tara Sadjadpour, Rares Ambrus, Jeannette Bohg
Our main contributions include a novel fusion approach for combining camera and LiDAR sensory signals to learn affinities, and a first-of-its-kind multimodal sequential track confidence refinement technique that fuses 2D and 3D detections.
2 code implementations • ICCV 2023 • Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus
NeO 360's representation allows us to learn from a large collection of unbounded 3D scenes while offering generalizability to new views and novel scenes from as few as a single image during inference.
Ranked #1 on
Generalizable Novel View Synthesis
on NERDS 360
no code implementations • 4 Aug 2023 • Takayuki Kanai, Igor Vasiljevic, Vitor Guizilini, Adrien Gaidon, Rares Ambrus
Autonomous vehicles and robots need to operate over a wide variety of scenarios in order to complete tasks efficiently and safely.
2 code implementations • ICCV 2023 • Vitor Guizilini, Igor Vasiljevic, Dian Chen, Rares Ambrus, Adrien Gaidon
Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions.
1 code implementation • 22 May 2023 • Jiading Fang, Shengjie Lin, Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Adrien Gaidon, Gregory Shakhnarovich, Matthew R. Walter
A practical benefit of implicit visual representations like Neural Radiance Fields (NeRFs) is their memory efficiency: large scenes can be efficiently stored and shared as small neural nets instead of collections of images.
no code implementations • ICCV 2023 • Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Sergey Zakharov, Vincent Sitzmann, Adrien Gaidon
In this work, we propose to use the multi-view photometric objective from the self-supervised depth estimation literature as a geometric regularizer for volumetric rendering, significantly improving novel view synthesis without requiring additional information.
1 code implementation • CVPR 2023 • Dian Chen, Jie Li, Vitor Guizilini, Rares Ambrus, Adrien Gaidon
We design view-conditioned queries at the output level, which enables the generation of multiple virtual frames during training to learn viewpoint equivariance by enforcing multi-view consistency.
no code implementations • 12 Dec 2022 • Sergey Zakharov, Rares Ambrus, Katherine Liu, Adrien Gaidon
Compact and accurate representations of 3D shapes are central to many perception and robotics tasks.
no code implementations • 8 Nov 2022 • Tara Sadjadpour, Jie Li, Rares Ambrus, Jeannette Bohg
To address these issues in a unified framework, we propose to learn shape and spatio-temporal affinities between tracks and detections in consecutive frames.
no code implementations • 23 Oct 2022 • Sergey Zakharov, Rares Ambrus, Vitor Guizilini, Wadim Kehl, Adrien Gaidon
In this paper, we show that the recent progress in neural rendering enables a new unified approach we call Photo-realistic Neural Domain Randomization (PNDR).
1 code implementation • 28 Jul 2022 • Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Greg Shakhnarovich, Matthew Walter, Adrien Gaidon
Modern 3D computer vision leverages learning to boost geometric reasoning, mapping image data to classical structures such as cost volumes or epipolar constraints to improve matching.
2 code implementations • 27 Jul 2022 • Muhammad Zubair Irshad, Sergey Zakharov, Rares Ambrus, Thomas Kollar, Zsolt Kira, Adrien Gaidon
A novel disentangled shape and appearance database of priors is first learned to embed objects in their respective shape and appearance space.
3D Shape Reconstruction From A Single 2D Image
6D Pose Estimation
+4
no code implementations • 22 Jul 2022 • Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann
We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene.
no code implementations • 12 Jul 2022 • Colton Stearns, Davis Rempe, Jie Li, Rares Ambrus, Sergey Zakharov, Vitor Guizilini, Yanchao Yang, Leonidas J Guibas
In this work, we develop a holistic representation of traffic scenes that leverages both spatial and temporal information of the actors in the scene.
1 code implementation • 16 Jun 2022 • Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki
Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors.
Autonomous Vehicles
Bird's-Eye View Semantic Segmentation
+1
no code implementations • CVPR 2022 • Vitor Guizilini, Rares Ambrus, Dian Chen, Sergey Zakharov, Adrien Gaidon
Experiments on the KITTI and DDAD datasets show that our DepthFormer architecture establishes a new state of the art in self-supervised monocular depth estimation, and is even competitive with highly specialized supervised single-frame architectures.
no code implementations • 28 Mar 2022 • Vitor Guizilini, Kuan-Hui Lee, Rares Ambrus, Adrien Gaidon
However, the simultaneous self-supervised learning of depth and scene flow is ill-posed, as there are infinitely many combinations that result in the same 3D point.
no code implementations • 6 Dec 2021 • Jiading Fang, Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Greg Shakhnarovich, Adrien Gaidon, Matthew R. Walter
Camera calibration is integral to robotics and computer vision algorithms that seek to infer geometric properties of the scene from visual input streams.
no code implementations • ICCV 2021 • Aditya Ganeshan, Alexis Vallet, Yasunori Kudo, Shin-ichi Maeda, Tommi Kerola, Rares Ambrus, Dennis Park, Adrien Gaidon
Deep learning models for semantic segmentation rely on expensive, large-scale, manually annotated datasets.
Ranked #40 on
Semantic Segmentation
on NYU Depth v2
2 code implementations • ICCV 2021 • Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, Adrien Gaidon
Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce 3D pointclouds, turning cameras into pseudo-lidar sensors.
Ranked #1 on
Monocular 3D Object Detection
on KITTI Pedestrian Moderate
(using extra training data)
no code implementations • 31 Mar 2021 • Vitor Guizilini, Igor Vasiljevic, Rares Ambrus, Greg Shakhnarovich, Adrien Gaidon
In this work, we extend monocular self-supervised depth and ego-motion estimation to large-baseline multi-camera rigs.
no code implementations • ICCV 2021 • Vitor Guizilini, Jie Li, Rares Ambrus, Adrien Gaidon
Simulators can efficiently generate large amounts of labeled synthetic data with perfect supervision for hard-to-label tasks like semantic segmentation.
1 code implementation • CVPR 2021 • Vitor Guizilini, Rares Ambrus, Wolfram Burgard, Adrien Gaidon
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
no code implementations • 5 Jan 2021 • Rares Ambrus, Vitor Guizilini, Naveen Kuppuswamy, Andrew Beaulieu, Adrien Gaidon, Alex Alspach
Fluid-filled soft visuotactile sensors such as the Soft-bubbles alleviate key challenges for robust manipulation, as they enable reliable grasps along with the ability to obtain high-resolution sensory feedback on contact geometry and forces.
1 code implementation • 26 Dec 2020 • Hsu-kuang Chiu, Jie Li, Rares Ambrus, Jeannette Bohg
Second, we propose to learn a metric that combines the Mahalanobis and feature distances when comparing a track and a new detection in data association.
no code implementations • 29 Aug 2020 • Andreas Bühler, Adrien Gaidon, Andrei Cramariuc, Rares Ambrus, Guy Rosman, Wolfram Burgard
In this work, we propose a behavioral cloning approach that can safely leverage imperfect perception without being conservative.
1 code implementation • 15 Aug 2020 • Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Wolfram Burgard, Greg Shakhnarovich, Adrien Gaidon
Self-supervised learning has emerged as a powerful tool for depth and ego-motion estimation, leading to state-of-the-art results on benchmark datasets.
1 code implementation • ICLR 2020 • Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon
Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions.
2 code implementations • ICLR 2020 • Jiexiong Tang, Hanme Kim, Vitor Guizilini, Sudeep Pillai, Rares Ambrus
By making the sampling of inlier-outlier sets from point-pair correspondences fully differentiable within the keypoint learning framework, we show that are able to simultaneously self-supervise keypoint description and improve keypoint matching.
1 code implementation • 7 Dec 2019 • Jiexiong Tang, Rares Ambrus, Vitor Guizilini, Sudeep Pillai, Hanme Kim, Patric Jensfelt, Adrien Gaidon
Detecting and matching robust viewpoint-invariant keypoints is critical for visual SLAM and Structure-from-Motion.
no code implementations • 4 Oct 2019 • Vitor Guizilini, Jie Li, Rares Ambrus, Sudeep Pillai, Adrien Gaidon
Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks.
no code implementations • 4 Oct 2019 • Rares Ambrus, Vitor Guizilini, Jie Li, Sudeep Pillai, Adrien Gaidon
Learning depth and camera ego-motion from raw unlabeled RGB video streams is seeing exciting progress through self-supervision from strong geometric cues.
4 code implementations • CVPR 2020 • Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, Adrien Gaidon
Although cameras are ubiquitous, robotic platforms typically rely on active sensors like LiDAR for direct 3D perception.
no code implementations • 3 Oct 2018 • Sudeep Pillai, Rares Ambrus, Adrien Gaidon
Both contributions provide significant performance gains over the state-of-the-art in self-supervised depth and pose estimation on the public KITTI benchmark.
no code implementations • 18 Oct 2017 • Johan Ekekrantz, Nils Bore, Rares Ambrus, John Folkesson, Patric Jensfelt
In this paper we introduce a system for unsupervised object discovery and segmentation of RGBD-images.