Search Results for author: Shuran Song

Found 46 papers, 21 papers with code

Learning Pneumatic Non-Prehensile Manipulation with a Mobile Blower

1 code implementation5 Apr 2022 Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

We investigate pneumatic non-prehensile manipulation (i. e., blowing) as a means of efficiently moving scattered objects into a target receptacle.

Continuous Scene Representations for Embodied AI

no code implementations CVPR 2022 Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi

Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation.

CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration

no code implementations20 Mar 2022 Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, Shuran Song

Employing this philosophy, we design CLIP on Wheels (CoW) baselines for the task and evaluate each zero-shot model in both Habitat and RoboTHOR simulators.

Image Classification Object Localization

TANDEM: Learning Joint Exploration and Decision Making with Tactile Sensors

no code implementations1 Mar 2022 Jingxi Xu, Shuran Song, Matei Ciocarlie

Inspired by the human ability to perform complex manipulation in the complete absence of vision (like retrieving an object from a pocket), the robotic manipulation field is motivated to develop new methods for tactile-based object interaction.

Decision Making Efficient Exploration +1

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

no code implementations NeurIPS 2021 Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models.

Pose Estimation Self-Supervised Learning

Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly

no code implementations9 Oct 2021 Shubham Agrawal, Yulong Li, Jen-Shuo Liu, Steven K. Feiner, Shuran Song

To make teleoperation accessible to non-expert users, we propose the framework "Scene Editing as Teleoperation" (SEaT), where the key idea is to transform the traditional "robot-centric" interface into a "scene-centric" interface -- instead of controlling the robot, users focus on specifying the task's goal by manipulating digital twins of the real-world objects.

UMPNet: Universal Manipulation Policy Network for Articulated Objects

no code implementations13 Sep 2021 Zhenjia Xu, Zhanpeng He, Shuran Song

We introduce the Universal Manipulation Policy Network (UMPNet) -- a single image-based policy network that infers closed-loop action sequences for manipulating arbitrary articulated objects.

Learning to See before Learning to Act: Visual Pre-training for Manipulation

no code implementations1 Jul 2021 Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin

With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.

Transfer Learning

Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds

no code implementations NeurIPS 2021 Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.

Pose Estimation Self-Supervised Learning

Visual Perspective Taking for Opponent Behavior Modeling

no code implementations11 May 2021 Boyuan Chen, Yuhang Hu, Robert Kwiatkowski, Shuran Song, Hod Lipson

We suggest that visual behavior modeling and perspective taking skills will play a critical role in the ability of physical robots to fully integrate into real-world multi-agent activities.

GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion

no code implementations ICCV 2021 Cheng Chi, Shuran Song

By mapping the observed partial surface to the canonical space and completing it in this space, the output representation describes the garment's full configuration using a complete 3D mesh with the per-vertex canonical coordinate label.

3D Shape Representation Pose Estimation

Spatial Intention Maps for Multi-Agent Mobile Manipulation

1 code implementation23 Mar 2021 Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

The ability to communicate intention enables decentralized multi-agent robots to collaborate while performing physical tasks.


SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic Navigation

1 code implementation8 Dec 2020 Yiqing Liang, Boyuan Chen, Shuran Song

We introduce SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agent's navigation planning.

Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop

no code implementations7 Dec 2020 Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White

This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.

AdaGrasp: Learning an Adaptive Gripper-Aware Grasping Policy

1 code implementation28 Nov 2020 Zhenjia Xu, Beichun Qi, Shubham Agrawal, Shuran Song

We propose AdaGrasp, a method to learn a single grasping policy that generalizes to novel grippers.


Fit2Form: 3D Generative Model for Robot Gripper Form Design

1 code implementation12 Nov 2020 Huy Ha, Shubham Agrawal, Shuran Song

We propose Fit2Form, a 3D generative design framework that generates pairs of finger shapes to maximize design objectives (i. e., grasp success, stability, and robustness) for target grasp objects.

Learning a Decentralized Multi-arm Motion Planner

1 code implementation5 Nov 2020 Huy Ha, Jingxi Xu, Shuran Song

In this paper, we tackle this problem with multi-agent reinforcement learning, where a decentralized policy is trained to control one robot arm in the multi-arm system to reach its target end-effector pose given observations of its workspace state and target end-effector pose.

Motion Planning Multi-agent Reinforcement Learning +1

Learning 3D Dynamic Scene Representations for Robot Manipulation

2 code implementations3 Nov 2020 Zhenjia Xu, Zhanpeng He, Jiajun Wu, Shuran Song

3D scene representation for robot manipulation should capture three key object properties: permanency -- objects that become occluded over time continue to exist; amodal completeness -- objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity -- the movement of each object is continuous over space and time.

Multitask Learning Strengthens Adversarial Robustness

1 code implementation ECCV 2020 Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick

Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network.

Adversarial Defense Adversarial Robustness +1

Spatial Action Maps for Mobile Manipulation

1 code implementation20 Apr 2020 Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e. g., step forward, turn left, turn right, etc.)

Q-Learning Value prediction

Category-Level Articulated Object Pose Estimation

2 code implementations CVPR 2020 Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song

We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space.

Pose Estimation

Grasping in the Wild:Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations

no code implementations9 Dec 2019 Shuran Song, Andy Zeng, Johnny Lee, Thomas Funkhouser

A key aspect of our grasping model is that it uses "action-view" based rendering to simulate future states with respect to different possible actions.

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

1 code implementation30 Oct 2019 Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song

This formulation enables the model to acquire a broader understanding of how shapes and surfaces fit together for assembly -- allowing it to generalize to new objects and kits.

Pose Estimation

Visual Hide and Seek

no code implementations15 Oct 2019 Boyuan Chen, Shuran Song, Hod Lipson, Carl Vondrick

We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator.

ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation

1 code implementation6 Oct 2019 Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng, Shuran Song

To address these challenges, we present ClearGrasp -- a deep learning approach for estimating accurate 3D geometry of transparent objects from a single RGB-D image for robotic manipulation.

Depth Completion Monocular Depth Estimation +4

Neural Illumination: Lighting Prediction for Indoor Environments

no code implementations CVPR 2019 Shuran Song, Thomas Funkhouser

This paper addresses the task of estimating the light arriving from all directions to a 3D point observed at a selected pixel in an RGB image.

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

no code implementations27 Mar 2019 Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error.

Neural Graph Matching Networks for Fewshot 3D Action Recognition

no code implementations ECCV 2018 Michelle Guo, Edward Chou, De-An Huang, Shuran Song, Serena Yeung, Li Fei-Fei

We propose Neural Graph Matching (NGM) Networks, a novel framework that can learn to recognize a previous unseen 3D action class with only a few examples.

Few-Shot Learning Graph Matching +1

Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View

no code implementations CVPR 2018 Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation ( <=50%) in the form of an RGB-D image.

Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning

4 code implementations27 Mar 2018 Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

Skilled robotic manipulation benefits from complex synergies between non-prehensile (e. g. pushing) and prehensile (e. g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free.

Q-Learning reinforcement-learning

Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

no code implementations12 Dec 2017 Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image.

Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks

no code implementations CVPR 2017 Yinda Zhang, Shuran Song, Ersin Yumer, Manolis Savva, Joon-Young Lee, Hailin Jin, Thomas Funkhouser

One of the bottlenecks in training for better representations is the amount of available per-pixel ground truth data that is required for core scene understanding tasks such as semantic segmentation, normal prediction, and object edge detection.

Boundary Detection Computer Vision +5

Semantic Scene Completion from a Single Depth Image

3 code implementations CVPR 2017 Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser

This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation.

3D Semantic Scene Completion

Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge

2 code implementations29 Sep 2016 Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker Jr., Alberto Rodriguez, Jianxiong Xiao

The approach was part of the MIT-Princeton Team system that took 3rd- and 4th- place in the stowing and picking tasks, respectively at APC 2016.

6D Pose Estimation 6D Pose Estimation using RGBD

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

2 code implementations CVPR 2017 Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser

To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions.

3D Reconstruction Point Cloud Registration

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images

no code implementations CVPR 2016 Shuran Song, Jianxiong Xiao

We focus on the task of amodal 3D object detection in RGB-D images, which aims to produce a 3D bounding box of an object in metric form at its full extent.

3D Object Detection object-detection +2

Robot In a Room: Toward Perfect Object Recognition in Closed Environments

no code implementations9 Jul 2015 Shuran Song, Linguang Zhang, Jianxiong Xiao

By constraining a robot to stay in a limited territory, we can ensure that the robot has seen most objects before and the speed of introducing a new object is slow.

Object Recognition

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

4 code implementations10 Jun 2015 Fisher Yu, Ari Seff, yinda zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao

While there has been remarkable progress in the performance of visual recognition algorithms, the state-of-the-art models tend to be exceptionally data-hungry.

SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite

no code implementations CVPR 2015 Shuran Song, Samuel P. Lichtenberg, Jianxiong Xiao

Although RGB-D sensors have enabled major breakthroughs for several vision tasks, such as 3D reconstruction, we have not attained the same level of success in high-level scene understanding.

3D Reconstruction Benchmark +1

3D ShapeNets: A Deep Representation for Volumetric Shapes

no code implementations CVPR 2015 Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao

Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically.

Ranked #23 on 3D Point Cloud Classification on ModelNet40 (Mean Accuracy metric)

3D Point Cloud Classification 3D Shape Representation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.