1 code implementation • 17 Jan 2025 • Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield
However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching.
no code implementations • 25 Nov 2024 • Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su, Stan Birchfield
Spatial understanding is a crucial capability for robots to make grounded decisions based on their environment.
no code implementations • 20 Oct 2024 • Alex Zook, Fan-Yun Sun, Josef Spjut, Valts Blukis, Stan Birchfield, Jonathan Tremblay
We introduce GRS (Generating Robotic Simulation tasks), a novel system to address the challenge of real-to-sim in robotics, computer vision, and AR/VR.
no code implementations • 25 Sep 2024 • Alexander Popov, Alperen Degirmenci, David Wehr, Shashank Hegde, Ryan Oldja, Alexey Kamenev, Bertrand Douillard, David Nistér, Urs Muller, Ruchi Bhargava, Stan Birchfield, Nikolai Smolyanskiy
By leveraging a world model during training, the driving policy effectively mitigates covariate shift without requiring an excessive amount of training data.
1 code implementation • CVPR 2024 • Zhenggang Tang, Zhongzheng Ren, Xiaoming Zhao, Bowen Wen, Jonathan Tremblay, Stan Birchfield, Alexander Schwing
We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene.
no code implementations • 27 May 2024 • Zhen Qin, Xuyang Shen, Weigao Sun, Dong Li, Stan Birchfield, Richard Hartley, Yiran Zhong
Finally, the memory state is projected back to a low-dimensional space in the Shrink stage.
1 code implementation • CVPR 2024 • Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield
We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states.
1 code implementation • CVPR 2024 • Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield
We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups.
no code implementations • 30 Nov 2023 • Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Jenny Seidenschwarz, Mike Zheng Shou, Deva Ramanan, Shuran Song, Stan Birchfield, Bowen Wen, Jeffrey Ichnowski
Just as object pose estimation and tracking have aided robots for rigid manipulation, dense 3D tracking (scene flow) of highly deformable objects will enable new applications in robotics while aiding existing approaches, such as imitation learning or creating digital twins with real2sim transfer.
no code implementations • 30 Sep 2023 • Jonathan Tremblay, Bowen Wen, Valts Blukis, Balakumar Sundaralingam, Stephen Tyree, Stan Birchfield
We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object.
no code implementations • 2 Aug 2023 • Andrew Guo, Bowen Wen, Jianhe Yuan, Jonathan Tremblay, Stephen Tyree, Jeffrey Smith, Stan Birchfield
We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks.
no code implementations • 3 Apr 2023 • Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber
At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds.
no code implementations • CVPR 2023 • Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, Kuk-Jin Yoon
Unlike previous unsupervised domain adaptation methods for category-level object pose estimation, our approach processes the test data in a sequential, online manner, and it does not require access to the source domain at runtime.
1 code implementation • CVPR 2023 • Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, Stan Birchfield
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object.
no code implementations • CVPR 2023 • Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu
In contrast, in this work we focus on synthesizing complex interactions (ie, an articulated hand) with a given object.
1 code implementation • 30 Jan 2023 • Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong
To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
1 code implementation • 13 Dec 2022 • Yann Labbé, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, Dieter Fox, Josef Sivic
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
no code implementations • 21 Oct 2022 • Zhenggang Tang, Balakumar Sundaralingam, Jonathan Tremblay, Bowen Wen, Ye Yuan, Stephen Tyree, Charles Loop, Alexander Schwing, Stan Birchfield
We present a system for collision-free control of a robot manipulator that uses only RGB views of the world.
no code implementations • 21 Oct 2022 • Valts Blukis, Taeyeop Lee, Jonathan Tremblay, Bowen Wen, In So Kweon, Kuk-Jin Yoon, Dieter Fox, Stan Birchfield
At test-time, we build the representation from a single RGB input image observing the scene from only one viewpoint.
1 code implementation • 18 Oct 2022 • Yunzhi Lin, Thomas Müller, Jonathan Tremblay, Bowen Wen, Stephen Tyree, Alex Evans, Patricio A. Vela, Stan Birchfield
We present a parallelized optimization method based on fast Neural Radiance Fields (NeRF) for estimating 6-DoF pose of a camera with respect to an object or scene.
2 code implementations • 11 Jul 2022 • Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
1 code implementation • 21 Jun 2022 • Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong
Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.
Ranked #301 on
Image Classification
on ImageNet
1 code implementation • 23 May 2022 • Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan Birchfield
We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category.
no code implementations • 14 May 2022 • Jonathan Tremblay, Moustafa Meshry, Alex Evans, Jan Kautz, Alexander Keller, Sameh Khamis, Thomas Müller, Charles Loop, Nathan Morrical, Koki Nagano, Towaki Takikawa, Stan Birchfield
We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels).
Ranked #1 on
Novel View Synthesis
on RTMV
1 code implementation • 11 Mar 2022 • Stephen Tyree, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Jeffrey Smith, Stan Birchfield
We propose a set of toy grocery objects, whose physical instantiations are readily available for purchase and are appropriately sized for robotic grasping and manipulation.
no code implementations • 23 Sep 2021 • Alexey Kamenev, Lirui Wang, Ollin Boer Bohan, Ishwar Kulkarni, Bilal Kartal, Artem Molchanov, Stan Birchfield, David Nistér, Nikolai Smolyanskiy
Predicting the future motion of traffic agents is crucial for safe and efficient autonomous driving.
1 code implementation • 13 Sep 2021 • Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, Stan Birchfield
Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected.
2 code implementations • 28 May 2021 • Nathan Morrical, Jonathan Tremblay, Yunzhi Lin, Stephen Tyree, Stan Birchfield, Valerio Pascucci, Ingo Wald
We present a Python-based renderer built on NVIDIA's OptiX ray tracing engine and the OptiX AI denoiser, designed to generate high-quality synthetic images for research in computer vision and deep learning.
2 code implementations • CVPR 2021 • Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, Dieter Fox
We introduce DexYCB, a new dataset for capturing hand grasping of objects.
1 code implementation • CVPR 2021 • Jianyuan Wang, Yiran Zhong, Yuchao Dai, Stan Birchfield, Kaihao Zhang, Nikolai Smolyanskiy, Hongdong Li
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
Ranked #28 on
Monocular Depth Estimation
on KITTI Eigen split
no code implementations • 1 Dec 2020 • Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai, Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz
A common way to speed up the computation is to downsample the feature volume, but this loses high-frequency details.
no code implementations • ICCV 2021 • Aayush Prakash, Shoubhik Debnath, Jean-Francois Lafleche, Eric Cameracci, Gavriel State, Stan Birchfield, Marc T. Law
Synthetic data is emerging as a promising solution to the scalability issue of supervised deep learning, especially when real data are difficult to acquire or hard to annotate.
no code implementations • 16 Nov 2020 • Guanya Shi, Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Fabio Ramos, Animashree Anandkumar, Yuke Zhu
Deep learning-based object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer.
1 code implementation • 26 Aug 2020 • Jonathan Tremblay, Stephen Tyree, Terry Mosier, Stan Birchfield
We present a robotic grasping system that uses a single external monocular RGB camera as input.
Robotics
no code implementations • 25 Aug 2020 • Jialiang Wang, Varun Jampani, Deqing Sun, Charles Loop, Stan Birchfield, Jan Kautz
End-to-end deep learning methods have advanced stereo vision in recent years and obtained excellent results when the training and test data are similar.
no code implementations • 9 Jun 2020 • Ke Chen, Ryan Oldja, Nikolai Smolyanskiy, Stan Birchfield, Alexander Popov, David Wehr, Ibrahim Eden, Joachim Pehserl
We show that our multi-view, multi-stage, multi-class approach is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input, in challenging scenes with more than one hundred vehicles and pedestrians at a time.
8 code implementations • ICCV 2019 • Zheng Tang, Milind Naphade, Stan Birchfield, Jonathan Tremblay, William Hodge, Ratnesh Kumar, Shuo Wang, Xiaodong Yang
In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID has received less attention.
no code implementations • 21 Nov 2019 • Visak Kumar, Tucker Herman, Dieter Fox, Stan Birchfield, Jonathan Tremblay
We propose a Grasping Objects Approach for Tactile (GOAT) robotic hands, learning to overcome the reality gap problem.
Robotics
2 code implementations • 21 Nov 2019 • Timothy E. Lee, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Oliver Kroemer, Dieter Fox, Stan Birchfield
We show experimental results for three different camera sensors, demonstrating that our approach is able to achieve accuracy with a single frame that is better than that of classic off-line hand-eye calibration using multiple frames.
Robotics
no code implementations • 7 Oct 2019 • Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu-Wei Chao, Qian Wan, Stan Birchfield, Nathan Ratliff, Dieter Fox
Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks.
no code implementations • 13 May 2019 • Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Stan Birchfield, Ming-Hsuan Yang, Jan Kautz
Through extensive experimentation on the ObjectNet3D and Pascal3D+ benchmark datasets, we demonstrate that our framework, which we call MetaView, significantly outperforms fine-tuning the state-of-the-art models with few examples, and that the specific architectural innovations of our method are crucial to achieving good performance.
no code implementations • CVPR 2019 • Zheng Tang, Milind Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, David Anastasiu, Jenq-Neng Hwang
Urban traffic optimization using traffic cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking.
1 code implementation • 16 Nov 2018 • Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff
We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs).
Robotics Systems and Control
no code implementations • 23 Oct 2018 • Aayush Prakash, Shaad Boochoon, Mark Brophy, David Acuna, Eric Cameracci, Gavriel State, Omer Shapira, Stan Birchfield
Moreover, synthetic SDR data combined with real KITTI data outperforms real KITTI data alone.
8 code implementations • 27 Sep 2018 • Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, Stan Birchfield
Using synthetic data generated in this manner, we introduce a one-shot deep neural network that is able to perform competitively against a state-of-the-art network trained on a combination of real and synthetic data.
Robotics
no code implementations • 4 Jul 2018 • Artem Molchanov, Karol Hausman, Stan Birchfield, Gaurav Sukhatme
In this work, we introduce a method based on region-growing that allows learning in an environment with any pair of initial and goal states.
1 code implementation • 18 May 2018 • Jonathan Tremblay, Thang To, Artem Molchanov, Stephen Tyree, Jan Kautz, Stan Birchfield
We present a system to infer and execute a human-readable program from a real-world demonstration.
Robotics
1 code implementation • 18 Apr 2018 • Jonathan Tremblay, Aayush Prakash, David Acuna, Mark Brophy, Varun Jampani, Cem Anil, Thang To, Eric Cameracci, Shaad Boochoon, Stan Birchfield
We present a system for training deep neural networks for object detection using synthetic images.
no code implementations • 18 Apr 2018 • Jonathan Tremblay, Thang To, Stan Birchfield
We present a new dataset, called Falling Things (FAT), for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics.
4 code implementations • 26 Mar 2018 • Nikolai Smolyanskiy, Alexey Kamenev, Stan Birchfield
Despite the progress on monocular depth estimation in recent years, we show that the gap between monocular and stereo depth accuracy remains large$-$a particularly relevant result due to the prevalent reliance upon monocular cameras by vehicles that are expected to be self-driving.
1 code implementation • CVPR 2014 • Steven Hickson, Stan Birchfield, Irfan Essa, Henrik Christensen
We present an efficient and scalable algorithm for segmenting 3D RGBD point clouds by combining depth, color, and temporal information using a multistage, hierarchical graph-based approach.
no code implementations • 2 Aug 2017 • Bryan Willimon, Steven Hickson, Ian Walker, Stan Birchfield
In particular, we show that our method is able to estimate the configuration of a textureless nonrigid object with no correspondences available.
4 code implementations • 7 May 2017 • Nikolai Smolyanskiy, Alexey Kamenev, Jeffrey Smith, Stan Birchfield
We present a micro aerial vehicle (MAV) system, built with inexpensive off-the-shelf hardware, for autonomously following trails in unstructured, outdoor environments such as forests.
Robotics