Search Results for author: Shuran Song

Found 62 papers, 28 papers with code

RICo: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction

no code implementations21 Jul 2023 Isaac Kasahara, Shubham Agrawal, Selim Engin, Nikhil Chavan-Dafle, Shuran Song, Volkan Isler

General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects.

Autonomous Navigation

XSkill: Cross Embodiment Skill Discovery

no code implementations19 Jul 2023 Mengda Xu, Zhenjia Xu, Cheng Chi, Manuela Veloso, Shuran Song

Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior.

Imitation Learning Robot Manipulation

RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

1 code implementation10 Jul 2023 Zhao Mandi, Shreeya Jain, Shuran Song

We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning.

Trajectory Planning

Rearrangement Planning for General Part Assembly

no code implementations1 Jul 2023 Yulong Li, Andy Zeng, Shuran Song

Most successes in autonomous robotic assembly have been restricted to single target or category.

REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction

no code implementations27 Jun 2023 Zeyi Liu, Arpit Bahety, Shuran Song

To leverage the power of LLM for robot failure explanation, we introduce a framework REFLECT, which queries LLM to identify and explain robot failures given a hierarchical summary of robot past experiences generated from multi-sensory data.

Common Sense Reasoning

TidyBot: Personalized Robot Assistance with Large Language Models

1 code implementation9 May 2023 Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, Thomas Funkhouser

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios.

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

no code implementations12 Mar 2023 Siddharth Singi, Zhanpeng He, Alvin Pan, Sandip Patel, Gunnar A. Sigurdsson, Robinson Piramuthu, Shuran Song, Matei Ciocarlie

In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed.

Decision Making

CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning

no code implementations12 Dec 2022 Zhao Mandi, Homanga Bharadhwaj, Vincent Moens, Shuran Song, Aravind Rajeswaran, Vikash Kumar

On a real robot setup, CACTI enables efficient training of a single policy that can perform 10 manipulation tasks involving kitchen objects, and is robust to varying layouts of distractors.

Data Augmentation Image Generation +3

ASPiRe:Adaptive Skill Priors for Reinforcement Learning

no code implementations30 Sep 2022 Mengda Xu, Manuela Veloso, Shuran Song

We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that leverages prior experience to accelerate reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild

no code implementations24 Sep 2022 Jiayi Chen, Mi Yan, Jiazhao Zhang, Yinzhen Xu, Xiaolong Li, Yijia Weng, Li Yi, Shuran Song, He Wang

We for the first time propose a point cloud based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion.

hand-object pose Object Tracking +1

TANDEM3D: Active Tactile Exploration for 3D Object Recognition

no code implementations19 Sep 2022 Jingxi Xu, Han Lin, Shuran Song, Matei Ciocarlie

In this work, we propose TANDEM3D, a method that applies a co-training framework for exploration and decision making to 3D object recognition with tactile signals.

3D Object Recognition Decision Making

Patching open-vocabulary models by interpolating weights

1 code implementation10 Aug 2022 Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate.

Image Classification

Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models

1 code implementation23 Jul 2022 Huy Ha, Shuran Song

We study open-world 3D scene understanding, a family of tasks that require agents to reason about their 3D environment with an open-set vocabulary and out-of-domain visual inputs - a critical skill for robots to operate in the unstructured 3D world.

Scene Understanding

Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery

no code implementations19 Jul 2022 Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song

We introduce Structure from Action (SfA), a framework to discover 3D part geometry and joint parameters of unseen articulated objects via a sequence of inferred interactions.

BusyBot: Learning to Interact, Reason, and Plan in a BusyBoard Environment

1 code implementation17 Jul 2022 Zeyi Liu, Zhenjia Xu, Shuran Song

We introduce BusyBoard, a toy-inspired robot learning environment that leverages a diverse set of articulated objects and inter-object functional relations to provide rich visual feedback for robot interactions.

Causal Discovery Robot Manipulation +2

Learning Pneumatic Non-Prehensile Manipulation with a Mobile Blower

1 code implementation5 Apr 2022 Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

We investigate pneumatic non-prehensile manipulation (i. e., blowing) as a means of efficiently moving scattered objects into a target receptacle.

Continuous Scene Representations for Embodied AI

no code implementations CVPR 2022 Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi

Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation.

CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation

no code implementations CVPR 2023 Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, Shuran Song

To better evaluate L-ZSON, we introduce the Pasture benchmark, which considers finding uncommon objects, objects described by spatial and appearance attributes, and hidden objects described relative to visible objects.

Image Classification Object Localization +1

TANDEM: Learning Joint Exploration and Decision Making with Tactile Sensors

no code implementations1 Mar 2022 Jingxi Xu, Shuran Song, Matei Ciocarlie

Inspired by the human ability to perform complex manipulation in the complete absence of vision (like retrieving an object from a pocket), the robotic manipulation field is motivated to develop new methods for tactile-based object interaction.

Decision Making Efficient Exploration +1

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

no code implementations NeurIPS 2021 Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models.

Pose Estimation Self-Supervised Learning

Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly

1 code implementation9 Oct 2021 Yulong Li, Shubham Agrawal, Jen-Shuo Liu, Steven K. Feiner, Shuran Song

To make teleoperation accessible to non-expert users, we propose the framework "Scene Editing as Teleoperation" (SEaT), where the key idea is to transform the traditional "robot-centric" interface into a "scene-centric" interface -- instead of controlling the robot, users focus on specifying the task's goal by manipulating digital twins of the real-world objects.

UMPNet: Universal Manipulation Policy Network for Articulated Objects

no code implementations13 Sep 2021 Zhenjia Xu, Zhanpeng He, Shuran Song

We introduce the Universal Manipulation Policy Network (UMPNet) -- a single image-based policy network that infers closed-loop action sequences for manipulating arbitrary articulated objects.

Learning to See before Learning to Act: Visual Pre-training for Manipulation

no code implementations1 Jul 2021 Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin

With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.

Transfer Learning

Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds

no code implementations NeurIPS 2021 Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.

Pose Estimation Self-Supervised Learning

Visual Perspective Taking for Opponent Behavior Modeling

no code implementations11 May 2021 Boyuan Chen, Yuhang Hu, Robert Kwiatkowski, Shuran Song, Hod Lipson

We suggest that visual behavior modeling and perspective taking skills will play a critical role in the ability of physical robots to fully integrate into real-world multi-agent activities.

GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion

no code implementations ICCV 2021 Cheng Chi, Shuran Song

By mapping the observed partial surface to the canonical space and completing it in this space, the output representation describes the garment's full configuration using a complete 3D mesh with the per-vertex canonical coordinate label.

3D Shape Representation Pose Estimation

Spatial Intention Maps for Multi-Agent Mobile Manipulation

1 code implementation23 Mar 2021 Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

The ability to communicate intention enables decentralized multi-agent robots to collaborate while performing physical tasks.

SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic Navigation

1 code implementation8 Dec 2020 Yiqing Liang, Boyuan Chen, Shuran Song

We introduce SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agent's navigation planning.


Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop

no code implementations7 Dec 2020 Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White

This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.

AdaGrasp: Learning an Adaptive Gripper-Aware Grasping Policy

1 code implementation28 Nov 2020 Zhenjia Xu, Beichun Qi, Shubham Agrawal, Shuran Song

We propose AdaGrasp, a method to learn a single grasping policy that generalizes to novel grippers.


Fit2Form: 3D Generative Model for Robot Gripper Form Design

1 code implementation12 Nov 2020 Huy Ha, Shubham Agrawal, Shuran Song

We propose Fit2Form, a 3D generative design framework that generates pairs of finger shapes to maximize design objectives (i. e., grasp success, stability, and robustness) for target grasp objects.

Learning a Decentralized Multi-arm Motion Planner

1 code implementation5 Nov 2020 Huy Ha, Jingxi Xu, Shuran Song

In this paper, we tackle this problem with multi-agent reinforcement learning, where a decentralized policy is trained to control one robot arm in the multi-arm system to reach its target end-effector pose given observations of its workspace state and target end-effector pose.

Motion Planning Multi-agent Reinforcement Learning +2

Learning 3D Dynamic Scene Representations for Robot Manipulation

2 code implementations3 Nov 2020 Zhenjia Xu, Zhanpeng He, Jiajun Wu, Shuran Song

3D scene representation for robot manipulation should capture three key object properties: permanency -- objects that become occluded over time continue to exist; amodal completeness -- objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity -- the movement of each object is continuous over space and time.

Robot Manipulation

Multitask Learning Strengthens Adversarial Robustness

1 code implementation ECCV 2020 Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick

Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network.

Adversarial Defense Adversarial Robustness

Spatial Action Maps for Mobile Manipulation

1 code implementation20 Apr 2020 Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e. g., step forward, turn left, turn right, etc.)

Q-Learning Value prediction

Category-Level Articulated Object Pose Estimation

2 code implementations CVPR 2020 Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song

We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space.

Pose Estimation

Grasping in the Wild:Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations

no code implementations9 Dec 2019 Shuran Song, Andy Zeng, Johnny Lee, Thomas Funkhouser

A key aspect of our grasping model is that it uses "action-view" based rendering to simulate future states with respect to different possible actions.

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

1 code implementation30 Oct 2019 Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song

This formulation enables the model to acquire a broader understanding of how shapes and surfaces fit together for assembly -- allowing it to generalize to new objects and kits.

Pose Estimation

Visual Hide and Seek

no code implementations15 Oct 2019 Boyuan Chen, Shuran Song, Hod Lipson, Carl Vondrick

We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator.


ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation

1 code implementation6 Oct 2019 Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng, Shuran Song

To address these challenges, we present ClearGrasp -- a deep learning approach for estimating accurate 3D geometry of transparent objects from a single RGB-D image for robotic manipulation.

Depth Completion Monocular Depth Estimation +4

Neural Illumination: Lighting Prediction for Indoor Environments

no code implementations CVPR 2019 Shuran Song, Thomas Funkhouser

This paper addresses the task of estimating the light arriving from all directions to a 3D point observed at a selected pixel in an RGB image.

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

no code implementations27 Mar 2019 Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error.


Neural Graph Matching Networks for Fewshot 3D Action Recognition

no code implementations ECCV 2018 Michelle Guo, Edward Chou, De-An Huang, Shuran Song, Serena Yeung, Li Fei-Fei

We propose Neural Graph Matching (NGM) Networks, a novel framework that can learn to recognize a previous unseen 3D action class with only a few examples.

Few-Shot Learning Graph Matching +1

Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View

no code implementations CVPR 2018 Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation ( <=50%) in the form of an RGB-D image.

Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning

4 code implementations27 Mar 2018 Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

Skilled robotic manipulation benefits from complex synergies between non-prehensile (e. g. pushing) and prehensile (e. g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free.

Q-Learning reinforcement-learning +1

Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

no code implementations12 Dec 2017 Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image.

Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks

no code implementations CVPR 2017 Yinda Zhang, Shuran Song, Ersin Yumer, Manolis Savva, Joon-Young Lee, Hailin Jin, Thomas Funkhouser

One of the bottlenecks in training for better representations is the amount of available per-pixel ground truth data that is required for core scene understanding tasks such as semantic segmentation, normal prediction, and object edge detection.

Boundary Detection Edge Detection +4

Semantic Scene Completion from a Single Depth Image

3 code implementations CVPR 2017 Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser

This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation.

3D Semantic Scene Completion

Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge

2 code implementations29 Sep 2016 Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker Jr., Alberto Rodriguez, Jianxiong Xiao

The approach was part of the MIT-Princeton Team system that took 3rd- and 4th- place in the stowing and picking tasks, respectively at APC 2016.

6D Pose Estimation 6D Pose Estimation using RGBD

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

2 code implementations CVPR 2017 Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser

To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions.

3D Reconstruction Point Cloud Registration

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images

no code implementations CVPR 2016 Shuran Song, Jianxiong Xiao

We focus on the task of amodal 3D object detection in RGB-D images, which aims to produce a 3D bounding box of an object in metric form at its full extent.

Ranked #6 on 3D Object Detection on SUN-RGBD val (Inference Speed (s) metric)

3D Object Detection object-detection +2

Robot In a Room: Toward Perfect Object Recognition in Closed Environments

no code implementations9 Jul 2015 Shuran Song, Linguang Zhang, Jianxiong Xiao

By constraining a robot to stay in a limited territory, we can ensure that the robot has seen most objects before and the speed of introducing a new object is slow.

Object Recognition

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

4 code implementations10 Jun 2015 Fisher Yu, Ari Seff, yinda zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao

While there has been remarkable progress in the performance of visual recognition algorithms, the state-of-the-art models tend to be exceptionally data-hungry.

SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite

no code implementations CVPR 2015 Shuran Song, Samuel P. Lichtenberg, Jianxiong Xiao

Although RGB-D sensors have enabled major breakthroughs for several vision tasks, such as 3D reconstruction, we have not attained the same level of success in high-level scene understanding.

3D Reconstruction Scene Understanding

3D ShapeNets: A Deep Representation for Volumetric Shapes

no code implementations CVPR 2015 Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao

Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically.

Ranked #34 on 3D Point Cloud Classification on ModelNet40 (Mean Accuracy metric)

3D Point Cloud Classification 3D Shape Representation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.