Search Results for author: Shubham Tulsiani

Found 56 papers, 23 papers with code

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

no code implementations2 May 2024 Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta, Shubham Tulsiani

We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation: interacting with unseen objects in novel scenes without test-time adaptation.

Robot Manipulation Test-time Adaptation

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

no code implementations CVPR 2024 Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani

We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category.

Denoising Object

MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation

no code implementations CVPR 2024 Hanzhe Hu, Zhizhuo Zhou, Varun Jampani, Shubham Tulsiani

We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.

Denoising Depth Estimation +1

Cameras as Rays: Pose Estimation via Ray Diffusion

no code implementations22 Feb 2024 Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani

Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10).

3D Reconstruction Camera Pose Estimation +3

UpFusion: Novel View Diffusion from Unposed Sparse View Observations

no code implementations11 Dec 2023 Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov, Shubham Tulsiani

We propose UpFusion, a system that can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images without corresponding pose information.

Novel View Synthesis

Towards Generalizable Zero-Shot Manipulation via Translating Human Interaction Plans

no code implementations1 Dec 2023 Homanga Bharadhwaj, Abhinav Gupta, Vikash Kumar, Shubham Tulsiani

We pursue the goal of developing robots that can interact zero-shot with generic unseen objects via a diverse repertoire of manipulation skills and show how passive human videos can serve as a rich source of data for learning such generalist robots.

Robot Manipulation Translation

RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking

no code implementations5 Sep 2023 Homanga Bharadhwaj, Jay Vakil, Mohit Sharma, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar

The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets.

Chunking Robot Manipulation

Visual Affordance Prediction for Guiding Robot Exploration

no code implementations28 May 2023 Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani

Motivated by the intuitive understanding humans have about the space of possible interactions, and the ease with which they can generalize this understanding to previously unseen scenes, we develop an approach for learning visual affordances for guiding robot exploration.

Analogy-Forming Transformers for Few-Shot 3D Parsing

no code implementations27 Apr 2023 Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki

We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of mapping a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predicts analogous part structures for the input scene, via an end-to-end learnable modulation mechanism.

Few-Shot Learning

Mesh2Tex: Generating Mesh Textures from Image Queries

no code implementations ICCV 2023 Alexey Bokhovkin, Shubham Tulsiani, Angela Dai

The learned texture manifold enables effective navigation to generate an object texture for a given 3D object geometry that matches to an input RGB image, which maintains robustness even under challenging real-world scenarios where the mesh geometry approximates an inexact match to the underlying geometry in the RGB image.

Object

Zero-Shot Robot Manipulation from Passive Human Videos

no code implementations3 Feb 2023 Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar

Can we learn robot manipulation for everyday tasks, only by watching videos of humans doing arbitrary tasks in different unstructured settings?

Robot Manipulation

Geometry-biased Transformers for Novel View Synthesis

no code implementations11 Jan 2023 Naveen Venkat, Mayank Agarwal, Maneesh Singh, Shubham Tulsiani

While this representation yields (coarsely) accurate images corresponding to novel viewpoints, the lack of geometric reasoning limits the quality of these outputs.

Novel View Synthesis

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

no code implementations CVPR 2023 Zhizhuo Zhou, Shubham Tulsiani

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation.

3D Reconstruction Image Generation +2

Monocular Dynamic View Synthesis: A Reality Check

1 code implementation24 Oct 2022 Hang Gao, RuiLong Li, Shubham Tulsiani, Bryan Russell, Angjoo Kanazawa

We study the recent progress on dynamic view synthesis (DVS) from monocular video.

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

1 code implementation11 Aug 2022 Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani

We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.

Object Object Reconstruction

Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction

no code implementations CVPR 2022 Kalyan Vasudev Alwala, Abhinav Gupta, Shubham Tulsiani

Our final 3D reconstruction model is also capable of zero-shot inference on images from unseen object categories and we empirically show that increasing the number of training categories improves the reconstruction quality.

3D Reconstruction Single-View 3D Reconstruction

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation

1 code implementation CVPR 2022 Paritosh Mittal, Yen-Chi Cheng, Maneesh Singh, Shubham Tulsiani

This enables us to represent distributions over 3D shapes conditioned on information from an arbitrary set of spatially anchored query locations and thus perform shape completion in such arbitrary settings (e. g., generating a complete chair given only a view of the back leg).

No RL, No Simulation: Learning to Navigate without Navigating

1 code implementation NeurIPS 2021 Meera Hahn, Devendra Chaplot, Shubham Tulsiani, Mustafa Mukadam, James M. Rehg, Abhinav Gupta

Most prior methods for learning navigation policies require access to simulation environments, as they need online policy interaction and rely on ground-truth maps for rewards.

Navigate Reinforcement Learning (RL)

NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild

1 code implementation NeurIPS 2021 Jason Y. Zhang, Gengshan Yang, Shubham Tulsiani, Deva Ramanan

NeRS learns a neural shape representation of a closed surface that is diffeomorphic to a sphere, guaranteeing water-tight reconstructions.

3D Reconstruction Neural Rendering

PixelTransformer: Sample Conditioned Signal Generation

no code implementations29 Mar 2021 Shubham Tulsiani, Abhinav Gupta

We propose a generative model that can infer a distribution for the underlying spatial signal conditioned on sparse samples e. g. plausible images given a few observed pixels.

Shelf-Supervised Mesh Prediction in the Wild

1 code implementation CVPR 2021 Yufei Ye, Shubham Tulsiani, Abhinav Gupta

We first infer a volumetric representation in a canonical frame, along with the camera pose.

Where2Act: From Pixels to Actions for Articulated 3D Objects

1 code implementation ICCV 2021 Kaichun Mo, Leonidas Guibas, Mustafa Mukadam, Abhinav Gupta, Shubham Tulsiani

One of the fundamental goals of visual perception is to allow agents to meaningfully interact with their environment.

Visual Imitation Made Easy

no code implementations11 Aug 2020 Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto

We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.

Imitation Learning

Object-Centric Multi-View Aggregation

no code implementations20 Jul 2020 Shubham Tulsiani, Or Litany, Charles R. Qi, He Wang, Leonidas J. Guibas

We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid.

Camera Pose Estimation Novel View Synthesis +2

Implicit Mesh Reconstruction from Unannotated Image Collections

no code implementations16 Jul 2020 Shubham Tulsiani, Nilesh Kulkarni, Abhinav Gupta

We present an approach to infer the 3D shape, texture, and camera pose for an object from a single RGB image, using only category-level image collections with foreground masks as supervision.

Articulation-aware Canonical Surface Mapping

1 code implementation CVPR 2020 Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, Shubham Tulsiani

We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image.

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

3 code implementations CVPR 2020 Kiana Ehsani, Shubham Tulsiani, Saurabh Gupta, Ali Farhadi, Abhinav Gupta

Our quantitative and qualitative results show that (a) we can predict meaningful forces from videos whose effects lead to accurate imitation of the motions observed, (b) by jointly optimizing for contact point and force prediction, we can improve the performance on both tasks in comparison to independent training, and (c) we can learn a representation from this model that generalizes to novel objects using few shot examples.

Human-Object Interaction Detection

Intrinsic Motivation for Encouraging Synergistic Behavior

no code implementations ICLR 2020 Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta

Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.

Discovering Motor Programs by Recomposing Demonstrations

no code implementations ICLR 2020 Tanmay Shankar, Shubham Tulsiani, Lerrel Pinto, Abhinav Gupta

In this paper, we present an approach to learn recomposable motor primitives across large-scale and diverse manipulation demonstrations.

Hierarchical Reinforcement Learning

Object-centric Forward Modeling for Model Predictive Control

1 code implementation8 Oct 2019 Yufei Ye, Dhiraj Gandhi, Abhinav Gupta, Shubham Tulsiani

We present an approach to learn an object-centric forward model, and show that this allows us to plan for sequences of actions to achieve distant desired goals.

Model Predictive Control Object

Efficient Bimanual Manipulation Using Learned Task Schemas

no code implementations30 Sep 2019 Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta

Our insight is that for many tasks, the learning process can be decomposed into learning a state-independent task schema (a sequence of skills to execute) and a policy to choose the parameterizations of the skills in a state-dependent manner.

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

1 code implementation7 May 2019 Tejas Khot, Shubham Agrawal, Shubham Tulsiani, Christoph Mertz, Simon Lucey, Martial Hebert

We demonstrate our ability to learn MVS without 3D supervision using a real dataset, and show that each component of our proposed robust loss results in a significant improvement.

3D geometry Depth Estimation +1

Layer-structured 3D Scene Inference via View Synthesis

1 code implementation ECCV 2018 Shubham Tulsiani, Richard Tucker, Noah Snavely

We present an approach to infer a layer-structured 3D representation of a scene from a single input image.

Learning Category-Specific Mesh Reconstruction from Image Collections

no code implementations ECCV 2018 Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

The shape is represented as a deformable 3D mesh model of an object category where a shape is parameterized by a learned mean shape and per-instance predicted deformation.

Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction

no code implementations CVPR 2018 Shubham Tulsiani, Alexei A. Efros, Jitendra Malik

We present a framework for learning single-view shape and pose prediction without using direct supervision for either.

Pose Prediction

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

no code implementations CVPR 2018 Shubham Tulsiani, Saurabh Gupta, David Fouhey, Alexei A. Efros, Jitendra Malik

The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose.

Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency

no code implementations CVPR 2017 Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view.

Hierarchical Surface Prediction for 3D Object Reconstruction

1 code implementation3 Apr 2017 Christian Häne, Shubham Tulsiani, Jitendra Malik

A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well.

3D geometry 3D Geometry Prediction +2

Learning Shape Abstractions by Assembling Volumetric Primitives

4 code implementations CVPR 2017 Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives.

View Synthesis by Appearance Flow

4 code implementations11 May 2016 Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints.

Novel View Synthesis

Shape and Symmetry Induction for 3D Objects

no code implementations24 Nov 2015 Shubham Tulsiani, Abhishek Kar, Qi-Xing Huang, João Carreira, Jitendra Malik

Actions as simple as grasping an object or navigating around it require a rich understanding of that object's 3D shape from a given viewpoint.

General Classification Object

Amodal Completion and Size Constancy in Natural Scenes

no code implementations ICCV 2015 Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image.

Object object-detection +3

Pose Induction for Novel Object Categories

1 code implementation ICCV 2015 Shubham Tulsiani, João Carreira, Jitendra Malik

We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes.

Object

Category-Specific Object Reconstruction from a Single Image

no code implementations CVPR 2015 Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik

Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today.

Object object-detection +2

Viewpoints and Keypoints

no code implementations CVPR 2015 Shubham Tulsiani, Jitendra Malik

We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details.

Keypoint Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.