no code implementations • 2 May 2024 • Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta, Shubham Tulsiani
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation: interacting with unseen objects in novel scenes without test-time adaptation.
no code implementations • CVPR 2024 • Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani
We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category.
no code implementations • CVPR 2024 • Hanzhe Hu, Zhizhuo Zhou, Varun Jampani, Shubham Tulsiani
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
no code implementations • 22 Feb 2024 • Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10).
no code implementations • 11 Dec 2023 • Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov, Shubham Tulsiani
We propose UpFusion, a system that can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images without corresponding pose information.
no code implementations • 1 Dec 2023 • Homanga Bharadhwaj, Abhinav Gupta, Vikash Kumar, Shubham Tulsiani
We pursue the goal of developing robots that can interact zero-shot with generic unseen objects via a diverse repertoire of manipulation skills and show how passive human videos can serve as a rich source of data for learning such generalist robots.
no code implementations • ICCV 2023 • Yufei Ye, Poorvi Hebbar, Abhinav Gupta, Shubham Tulsiani
We tackle the task of reconstructing hand-object interactions from short video clips.
no code implementations • 5 Sep 2023 • Homanga Bharadhwaj, Jay Vakil, Mohit Sharma, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar
The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets.
no code implementations • 28 May 2023 • Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani
Motivated by the intuitive understanding humans have about the space of possible interactions, and the ease with which they can generalize this understanding to previously unseen scenes, we develop an approach for learning visual affordances for guiding robot exploration.
1 code implementation • 8 May 2023 • Amy Lin, Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani
We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images).
no code implementations • 27 Apr 2023 • Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki
We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of mapping a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predicts analogous part structures for the input scene, via an end-to-end learnable modulation mechanism.
no code implementations • ICCV 2023 • Alexey Bokhovkin, Shubham Tulsiani, Angela Dai
The learned texture manifold enables effective navigation to generate an object texture for a given 3D object geometry that matches to an input RGB image, which maintains robustness even under challenging real-world scenarios where the mesh geometry approximates an inexact match to the underlying geometry in the RGB image.
no code implementations • CVPR 2023 • Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu
In contrast, in this work we focus on synthesizing complex interactions (ie, an articulated hand) with a given object.
no code implementations • 3 Feb 2023 • Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar
Can we learn robot manipulation for everyday tasks, only by watching videos of humans doing arbitrary tasks in different unstructured settings?
no code implementations • 11 Jan 2023 • Naveen Venkat, Mayank Agarwal, Maneesh Singh, Shubham Tulsiani
While this representation yields (coarsely) accurate images corresponding to novel viewpoints, the lack of geometric reasoning limits the quality of these outputs.
no code implementations • CVPR 2023 • Zhizhuo Zhou, Shubham Tulsiani
We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation.
1 code implementation • 24 Oct 2022 • Hang Gao, RuiLong Li, Shubham Tulsiani, Bryan Russell, Angjoo Kanazawa
We study the recent progress on dynamic view synthesis (DVS) from monocular video.
1 code implementation • 11 Aug 2022 • Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
1 code implementation • CVPR 2022 • Yufei Ye, Abhinav Gupta, Shubham Tulsiani
Our work aims to reconstruct hand-held objects given a single RGB image.
no code implementations • CVPR 2022 • Kalyan Vasudev Alwala, Abhinav Gupta, Shubham Tulsiani
Our final 3D reconstruction model is also capable of zero-shot inference on images from unseen object categories and we empirically show that increasing the number of training categories improves the reconstruction quality.
1 code implementation • CVPR 2022 • Paritosh Mittal, Yen-Chi Cheng, Maneesh Singh, Shubham Tulsiani
This enables us to represent distributions over 3D shapes conditioned on information from an arbitrary set of spatially anchored query locations and thus perform shape completion in such arbitrary settings (e. g., generating a complete chair given only a view of the back leg).
1 code implementation • 9 Nov 2021 • Bernardo Aceituno, Alberto Rodriguez, Shubham Tulsiani, Abhinav Gupta, Mustafa Mukadam
Specifying tasks with videos is a powerful technique towards acquiring novel and general robot skills.
1 code implementation • NeurIPS 2021 • Meera Hahn, Devendra Chaplot, Shubham Tulsiani, Mustafa Mukadam, James M. Rehg, Abhinav Gupta
Most prior methods for learning navigation policies require access to simulation environments, as they need online policy interaction and rely on ground-truth maps for rewards.
1 code implementation • NeurIPS 2021 • Jason Y. Zhang, Gengshan Yang, Shubham Tulsiani, Deva Ramanan
NeRS learns a neural shape representation of a closed surface that is diffeomorphic to a sphere, guaranteeing water-tight reconstructions.
no code implementations • 29 Mar 2021 • Shubham Tulsiani, Abhinav Gupta
We propose a generative model that can infer a distribution for the underlying spatial signal conditioned on sparse samples e. g. plausible images given a few observed pixels.
1 code implementation • CVPR 2021 • Yufei Ye, Shubham Tulsiani, Abhinav Gupta
We first infer a volumetric representation in a canonical frame, along with the camera pose.
1 code implementation • ICCV 2021 • Kaichun Mo, Leonidas Guibas, Mustafa Mukadam, Abhinav Gupta, Shubham Tulsiani
One of the fundamental goals of visual perception is to allow agents to meaningfully interact with their environment.
no code implementations • 11 Aug 2020 • Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
no code implementations • 20 Jul 2020 • Shubham Tulsiani, Or Litany, Charles R. Qi, He Wang, Leonidas J. Guibas
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid.
no code implementations • 16 Jul 2020 • Shubham Tulsiani, Nilesh Kulkarni, Abhinav Gupta
We present an approach to infer the 3D shape, texture, and camera pose for an object from a single RGB image, using only category-level image collections with foreground masks as supervision.
1 code implementation • NeurIPS 2020 • Victoria Dean, Shubham Tulsiani, Abhinav Gupta
Exploration is one of the core challenges in reinforcement learning.
1 code implementation • CVPR 2020 • Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, Shubham Tulsiani
We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image.
3 code implementations • CVPR 2020 • Kiana Ehsani, Shubham Tulsiani, Saurabh Gupta, Ali Farhadi, Abhinav Gupta
Our quantitative and qualitative results show that (a) we can predict meaningful forces from videos whose effects lead to accurate imitation of the motions observed, (b) by jointly optimizing for contact point and force prediction, we can improve the performance on both tasks in comparison to independent training, and (c) we can learn a representation from this model that generalizes to novel objects using few shot examples.
no code implementations • ICLR 2020 • Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
no code implementations • ICLR 2020 • Tanmay Shankar, Shubham Tulsiani, Lerrel Pinto, Abhinav Gupta
In this paper, we present an approach to learn recomposable motor primitives across large-scale and diverse manipulation demonstrations.
1 code implementation • 8 Oct 2019 • Yufei Ye, Dhiraj Gandhi, Abhinav Gupta, Shubham Tulsiani
We present an approach to learn an object-centric forward model, and show that this allows us to plan for sequences of actions to achieve distant desired goals.
no code implementations • 30 Sep 2019 • Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta
Our insight is that for many tasks, the learning process can be decomposed into learning a state-independent task schema (a sequence of skills to execute) and a policy to choose the parameterizations of the skills in a state-dependent manner.
2 code implementations • ICCV 2019 • Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani
We present an approach for pixel-level future prediction given an input image of a scene.
1 code implementation • ICCV 2019 • Nilesh Kulkarni, Abhinav Gupta, Shubham Tulsiani
We explore the task of Canonical Surface Mapping (CSM).
no code implementations • ICCV 2019 • Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta
We propose an approach to predict the 3D shape and pose for the objects present in a scene.
1 code implementation • 7 May 2019 • Tejas Khot, Shubham Agrawal, Shubham Tulsiani, Christoph Mertz, Simon Lucey, Martial Hebert
We demonstrate our ability to learn MVS without 3D supervision using a real dataset, and show that each component of our proposed robust loss results in a significant improvement.
1 code implementation • ECCV 2018 • Shubham Tulsiani, Richard Tucker, Noah Snavely
We present an approach to infer a layer-structured 3D representation of a scene from a single input image.
no code implementations • ECCV 2018 • Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik
The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation.
no code implementations • CVPR 2018 • Shubham Tulsiani, Alexei A. Efros, Jitendra Malik
We present a framework for learning single-view shape and pose prediction without using direct supervision for either.
no code implementations • CVPR 2018 • Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik
The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose.
1 code implementation • 17 Oct 2017 • Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M. Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qi-Xing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas
We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database.
no code implementations • CVPR 2017 • Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik
We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view.
1 code implementation • 3 Apr 2017 • Christian Häne, Shubham Tulsiani, Jitendra Malik
A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well.
4 code implementations • CVPR 2017 • Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik
We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives.
4 code implementations • 11 May 2016 • Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros
We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints.
no code implementations • 24 Nov 2015 • Shubham Tulsiani, Abhishek Kar, Qi-Xing Huang, João Carreira, Jitendra Malik
Actions as simple as grasping an object or navigating around it require a rich understanding of that object's 3D shape from a given viewpoint.
no code implementations • ICCV 2015 • Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik
We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image.
1 code implementation • ICCV 2015 • Shubham Tulsiani, João Carreira, Jitendra Malik
We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes.
no code implementations • CVPR 2015 • João Carreira, Abhishek Kar, Shubham Tulsiani, Jitendra Malik
All that structure from motion algorithms "see" are sets of 2D points.
no code implementations • CVPR 2015 • Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik
Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today.
no code implementations • CVPR 2015 • Shubham Tulsiani, Jitendra Malik
We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details.
Ranked #3 on Keypoint Detection on Pascal3D+