Search Results for author: Shuran Song

Found 67 papers, 32 papers with code

3D ShapeNets: A Deep Representation for Volumetric Shapes

no code implementations • CVPR 2015 • Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao

Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically.

Ranked #35 on 3D Point Cloud Classification on ModelNet40 (Mean Accuracy metric)

3D Point Cloud Classification 3D Shape Representation +2

Paper
Add Code

SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite

no code implementations • CVPR 2015 • Shuran Song, Samuel P. Lichtenberg, Jianxiong Xiao

Although RGB-D sensors have enabled major breakthroughs for several vision tasks, such as 3D reconstruction, we have not attained the same level of success in high-level scene understanding.

3D Reconstruction Scene Understanding

Paper
Add Code

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

4 code implementations • 10 Jun 2015 • Fisher Yu, Ari Seff, yinda zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao

While there has been remarkable progress in the performance of visual recognition algorithms, the state-of-the-art models tend to be exceptionally data-hungry.

517

Paper
Code

Robot In a Room: Toward Perfect Object Recognition in Closed Environments

no code implementations • 9 Jul 2015 • Shuran Song, Linguang Zhang, Jianxiong Xiao

By constraining a robot to stay in a limited territory, we can ensure that the robot has seen most objects before and the speed of introducing a new object is slow.

Object Object Recognition

Paper
Add Code

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images

no code implementations • CVPR 2016 • Shuran Song, Jianxiong Xiao

We focus on the task of amodal 3D object detection in RGB-D images, which aims to produce a 3D bounding box of an object in metric form at its full extent.

Ranked #6 on 3D Object Detection on SUN-RGBD val (Inference Speed (s) metric)

3D Object Detection Object +3

Paper
Add Code

ShapeNet: An Information-Rich 3D Model Repository

14 code implementations • 9 Dec 2015 • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu

We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects.

Data Visualization

65,339

Paper
Code

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

2 code implementations • CVPR 2017 • Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser

To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions.

Ranked #2 on 3D Reconstruction on Scan2CAD

3D Reconstruction Point Cloud Registration

798

Paper
Code

Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge

2 code implementations • 29 Sep 2016 • Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker Jr., Alberto Rodriguez, Jianxiong Xiao

The approach was part of the MIT-Princeton Team system that took 3rd- and 4th- place in the stowing and picking tasks, respectively at APC 2016.

6D Pose Estimation 6D Pose Estimation using RGBD +1

305

Paper
Code

Semantic Scene Completion from a Single Depth Image

3 code implementations • CVPR 2017 • Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser

This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation.

Ranked #2 on 3D Semantic Scene Completion on KITTI-360

3D Semantic Scene Completion

1,177

Paper
Code

Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks

no code implementations • CVPR 2017 • Yinda Zhang, Shuran Song, Ersin Yumer, Manolis Savva, Joon-Young Lee, Hailin Jin, Thomas Funkhouser

One of the bottlenecks in training for better representations is the amount of available per-pixel ground truth data that is required for core scene understanding tasks such as semantic segmentation, normal prediction, and object edge detection.

Boundary Detection Edge Detection +4

Paper
Add Code

Matterport3D: Learning from RGB-D Data in Indoor Environments

1 code implementation • 18 Sep 2017 • Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, yinda zhang

Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms.

General Classification Scene Understanding +1

891

Paper
Code

Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

3 code implementations • 3 Oct 2017 • Andy Zeng, Shuran Song, Kuan-Ting Yu, Elliott Donlon, Francois R. Hogan, Maria Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, Nima Fazeli, Ferran Alet, Nikhil Chavan Dafle, Rachel Holladay, Isabella Morona, Prem Qu Nair, Druck Green, Ian Taylor, Weber Liu, Thomas Funkhouser, Alberto Rodriguez

Since product images are readily available for a wide range of objects (e. g., from the web), the system works out-of-the-box for novel objects without requiring any additional training data.

Image Classification Robotic Grasping

289

Paper
Code

Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

no code implementations • 12 Dec 2017 • Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image.

Paper
Add Code

Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning

4 code implementations • 27 Mar 2018 • Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

Skilled robotic manipulation benefits from complex synergies between non-prehensile (e. g. pushing) and prehensile (e. g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free.

Q-Learning reinforcement-learning +1

848

Paper
Code

Im2Pano3D: Extrapolating 360Â° Structure and Semantics Beyond the Field of View

no code implementations • CVPR 2018 • Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

Paper
Add Code

Neural Graph Matching Networks for Fewshot 3D Action Recognition

no code implementations • ECCV 2018 • Michelle Guo, Edward Chou, De-An Huang, Shuran Song, Serena Yeung, Li Fei-Fei

We propose Neural Graph Matching (NGM) Networks, a novel framework that can learn to recognize a previous unseen 3D action class with only a few examples.

Ranked #1 on Skeleton Based Action Recognition on CAD-120

Few-Shot Learning Graph Matching +1

Paper
Add Code

Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation

7 code implementations • CVPR 2019 • He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, Leonidas J. Guibas

The goal of this paper is to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image.

Ranked #2 on 6D Pose Estimation using RGBD on CAMERA25

6D Pose Estimation 6D Pose Estimation using RGB +2

573

Paper
Code

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

no code implementations • 27 Mar 2019 • Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error.

Friction

Paper
Add Code

DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions

no code implementations • 10 Jun 2019 • Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B. Tenenbaum, Shuran Song

We study the problem of learning physical object representations for robot manipulation.

Friction Object +1

Paper
Add Code

Neural Illumination: Lighting Prediction for Indoor Environments

no code implementations • CVPR 2019 • Shuran Song, Thomas Funkhouser

This paper addresses the task of estimating the light arriving from all directions to a 3D point observed at a selected pixel in an RGB image.

Paper
Add Code

ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation

1 code implementation • 6 Oct 2019 • Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng, Shuran Song

To address these challenges, we present ClearGrasp -- a deep learning approach for estimating accurate 3D geometry of transparent objects from a single RGB-D image for robotic manipulation.

Ranked #1 on Semantic Segmentation on Cleargrasp (Novel)

Depth Completion Monocular Depth Estimation +4

267

Paper
Code

Visual Hide and Seek

no code implementations • 15 Oct 2019 • Boyuan Chen, Shuran Song, Hod Lipson, Carl Vondrick

We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator.

Navigate

Paper
Add Code

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

1 code implementation • 30 Oct 2019 • Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song

This formulation enables the model to acquire a broader understanding of how shapes and surfaces fit together for assembly -- allowing it to generalize to new objects and kits.

Object Pose Estimation

Paper
Code

Grasping in the Wild:Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations

no code implementations • 9 Dec 2019 • Shuran Song, Andy Zeng, Johnny Lee, Thomas Funkhouser

A key aspect of our grasping model is that it uses "action-view" based rendering to simulate future states with respect to different possible actions.

Paper
Add Code

Category-Level Articulated Object Pose Estimation

2 code implementations • CVPR 2020 • Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song

We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space.

Object Pose Estimation

109

Paper
Code

Spatial Action Maps for Mobile Manipulation

1 code implementation • 20 Apr 2020 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e. g., step forward, turn left, turn right, etc.)

Q-Learning Value prediction

Paper
Code

Multitask Learning Strengthens Adversarial Robustness

1 code implementation • ECCV 2020 • Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick

Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network.

Adversarial Defense Adversarial Robustness

Paper
Code

Learning 3D Dynamic Scene Representations for Robot Manipulation

2 code implementations • 3 Nov 2020 • Zhenjia Xu, Zhanpeng He, Jiajun Wu, Shuran Song

3D scene representation for robot manipulation should capture three key object properties: permanency -- objects that become occluded over time continue to exist; amodal completeness -- objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity -- the movement of each object is continuous over space and time.

Model Predictive Control Robot Manipulation

Paper
Code

Learning a Decentralized Multi-arm Motion Planner

1 code implementation • 5 Nov 2020 • Huy Ha, Jingxi Xu, Shuran Song

In this paper, we tackle this problem with multi-agent reinforcement learning, where a decentralized policy is trained to control one robot arm in the multi-arm system to reach its target end-effector pose given observations of its workspace state and target end-effector pose.

Motion Planning Multi-agent Reinforcement Learning +2

130

Paper
Code

Fit2Form: 3D Generative Model for Robot Gripper Form Design

1 code implementation • 12 Nov 2020 • Huy Ha, Shubham Agrawal, Shuran Song

We propose Fit2Form, a 3D generative design framework that generates pairs of finger shapes to maximize design objectives (i. e., grasp success, stability, and robustness) for target grasp objects.

Paper
Code

AdaGrasp: Learning an Adaptive Gripper-Aware Grasping Policy

1 code implementation • 28 Nov 2020 • Zhenjia Xu, Beichun Qi, Shubham Agrawal, Shuran Song

We propose AdaGrasp, a method to learn a single grasping policy that generalizes to novel grippers.

Robotics

Paper
Code

Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop

no code implementations • 7 Dec 2020 • Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White

This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.

Paper
Add Code

SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic Navigation

1 code implementation • 8 Dec 2020 • Yiqing Liang, Boyuan Chen, Shuran Song

We introduce SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agent's navigation planning.

Navigate

Paper
Code

Spatial Intention Maps for Multi-Agent Mobile Manipulation

1 code implementation • 23 Mar 2021 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

The ability to communicate intention enables decentralized multi-agent robots to collaborate while performing physical tasks.

Paper
Code

GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion

no code implementations • ICCV 2021 • Cheng Chi, Shuran Song

By mapping the observed partial surface to the canonical space and completing it in this space, the output representation describes the garment's full configuration using a complete 3D mesh with the per-vertex canonical coordinate label.

3D Shape Representation Pose Estimation

Paper
Add Code

Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery

no code implementations • ICCV 2021 • Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song

People often use physical intuition when manipulating articulated objects, irrespective of object semantics.

Motion Segmentation Physical Intuition

Paper
Add Code

Visual Perspective Taking for Opponent Behavior Modeling

no code implementations • 11 May 2021 • Boyuan Chen, Yuhang Hu, Robert Kwiatkowski, Shuran Song, Hod Lipson

We suggest that visual behavior modeling and perspective taking skills will play a critical role in the ability of physical robots to fully integrate into real-world multi-agent activities.

Paper
Add Code

Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds

no code implementations • NeurIPS 2021 • Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.

Object Pose Estimation +1

Paper
Add Code

Learning to See before Learning to Act: Visual Pre-training for Manipulation

no code implementations • 1 Jul 2021 • Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin

With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.

Transfer Learning

Paper
Add Code

UMPNet: Universal Manipulation Policy Network for Articulated Objects

no code implementations • 13 Sep 2021 • Zhenjia Xu, Zhanpeng He, Shuran Song

We introduce the Universal Manipulation Policy Network (UMPNet) -- a single image-based policy network that infers closed-loop action sequences for manipulating arbitrary articulated objects.

Attribute

Paper
Add Code

Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly

1 code implementation • 9 Oct 2021 • Yulong Li, Shubham Agrawal, Jen-Shuo Liu, Steven K. Feiner, Shuran Song

To make teleoperation accessible to non-expert users, we propose the framework "Scene Editing as Teleoperation" (SEaT), where the key idea is to transform the traditional "robot-centric" interface into a "scene-centric" interface -- instead of controlling the robot, users focus on specifying the task's goal by manipulating digital twins of the real-world objects.

Paper
Code

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

no code implementations • NeurIPS 2021 • Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song, He Wang

Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models.

Object Pose Estimation +1

Paper
Add Code

TANDEM: Learning Joint Exploration and Decision Making with Tactile Sensors

no code implementations • 1 Mar 2022 • Jingxi Xu, Shuran Song, Matei Ciocarlie

Inspired by the human ability to perform complex manipulation in the complete absence of vision (like retrieving an object from a pocket), the robotic manipulation field is motivated to develop new methods for tactile-based object interaction.

Decision Making Efficient Exploration +2

Paper
Add Code

CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation

1 code implementation • CVPR 2023 • Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, Shuran Song

To better evaluate L-ZSON, we introduce the Pasture benchmark, which considers finding uncommon objects, objects described by spatial and appearance attributes, and hidden objects described relative to visible objects.

Image Classification Object Localization +1

Paper
Code

Continuous Scene Representations for Embodied AI

no code implementations • CVPR 2022 • Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi

Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation.

Paper
Add Code

Learning Pneumatic Non-Prehensile Manipulation with a Mobile Blower

1 code implementation • 5 Apr 2022 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

We investigate pneumatic non-prehensile manipulation (i. e., blowing) as a means of efficiently moving scattered objects into a target receptacle.

Paper
Code

BusyBot: Learning to Interact, Reason, and Plan in a BusyBoard Environment

1 code implementation • 17 Jul 2022 • Zeyi Liu, Zhenjia Xu, Shuran Song

We introduce BusyBoard, a toy-inspired robot learning environment that leverages a diverse set of articulated objects and inter-object functional relations to provide rich visual feedback for robot interactions.

Causal Discovery Robot Manipulation +2

Paper
Code

Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery

no code implementations • 19 Jul 2022 • Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song

We introduce Structure from Action (SfA), a framework to discover 3D part geometry and joint parameters of unseen articulated objects via a sequence of inferred interactions.

Paper
Add Code

Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models

1 code implementation • 23 Jul 2022 • Huy Ha, Shuran Song

We study open-world 3D scene understanding, a family of tasks that require agents to reason about their 3D environment with an open-set vocabulary and out-of-domain visual inputs - a critical skill for robots to operate in the unstructured 3D world.

Scene Understanding

Paper
Code

Patching open-vocabulary models by interpolating weights

1 code implementation • 10 Aug 2022 • Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate.

Image Classification

Paper
Code

TANDEM3D: Active Tactile Exploration for 3D Object Recognition

no code implementations • 19 Sep 2022 • Jingxi Xu, Han Lin, Shuran Song, Matei Ciocarlie

In this work, we propose TANDEM3D, a method that applies a co-training framework for exploration and decision making to 3D object recognition with tactile signals.

3D Object Recognition Decision Making +1

Paper
Add Code

Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild

no code implementations • 24 Sep 2022 • Jiayi Chen, Mi Yan, Jiazhao Zhang, Yinzhen Xu, Xiaolong Li, Yijia Weng, Li Yi, Shuran Song, He Wang

We for the first time propose a point cloud based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion.

hand-object pose Object +2

Paper
Add Code

ASPiRe:Adaptive Skill Priors for Reinforcement Learning

no code implementations • 30 Sep 2022 • Mengda Xu, Manuela Veloso, Shuran Song

We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that leverages prior experience to accelerate reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning

no code implementations • 12 Dec 2022 • Zhao Mandi, Homanga Bharadhwaj, Vincent Moens, Shuran Song, Aravind Rajeswaran, Vikash Kumar

On a real robot setup, CACTI enables efficient training of a single policy that can perform 10 manipulation tasks involving kitchen objects, and is robust to varying layouts of distractors.

Data Augmentation Image Generation +3

Paper
Add Code

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

no code implementations • 12 Mar 2023 • Siddharth Singi, Zhanpeng He, Alvin Pan, Sandip Patel, Gunnar A. Sigurdsson, Robinson Piramuthu, Shuran Song, Matei Ciocarlie

In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed.

Decision Making

Paper
Add Code

DataComp: In search of the next generation of multimodal datasets

1 code implementation • NeurIPS 2023 • Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt

Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms.

Paper
Code

TidyBot: Personalized Robot Assistance with Large Language Models

1 code implementation • 9 May 2023 • Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, Thomas Funkhouser

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios.

480

Paper
Code

REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction

1 code implementation • 27 Jun 2023 • Zeyi Liu, Arpit Bahety, Shuran Song

The ability to detect and analyze failed executions automatically is crucial for an explainable and robust robotic system.

Common Sense Reasoning Large Language Model +1

Paper
Code

Rearrangement Planning for General Part Assembly

no code implementations • 1 Jul 2023 • Yulong Li, Andy Zeng, Shuran Song

Most successes in autonomous robotic assembly have been restricted to single target or category.

Paper
Add Code

RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

1 code implementation • 10 Jul 2023 • Zhao Mandi, Shreeya Jain, Shuran Song

We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning.

Trajectory Planning

102

Paper
Code

XSkill: Cross Embodiment Skill Discovery

1 code implementation • 19 Jul 2023 • Mengda Xu, Zhenjia Xu, Cheng Chi, Manuela Veloso, Shuran Song

Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior.

Imitation Learning Robot Manipulation

Paper
Code

RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction

no code implementations • 21 Jul 2023 • Isaac Kasahara, Shubham Agrawal, Selim Engin, Nikhil Chavan-Dafle, Shuran Song, Volkan Isler

General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects.

Autonomous Navigation

Paper
Add Code

MD-Splatting: Learning Metric Deformation from 4D Gaussians in Highly Deformable Scenes

no code implementations • 30 Nov 2023 • Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, Jeffrey Ichnowski

MD-Splatting builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel view synthesis.

Novel View Synthesis

Paper
Add Code

Tactile-based Object Retrieval From Granular Media

no code implementations • 7 Feb 2024 • Jingxi Xu, Yinsen Jia, Dongxiao Yang, Patrick Meng, Xinyue Zhu, Zihan Guo, Shuran Song, Matei Ciocarlie

We also introduce a training curriculum that enables learning these behaviors in simulation, followed by zero-shot transfer to real hardware.

Object Retrieval

Paper
Add Code

Dynamics-Guided Diffusion Model for Robot Manipulator Design

no code implementations • 23 Feb 2024 • Xiaomeng Xu, Huy Ha, Shuran Song

The design objective constructed from the target and predicted interaction profiles provides a gradient to guide the refinement of finger geometry for the task.

Paper
Add Code

Language models scale reliably with over-training and on downstream tasks

1 code implementation • 13 Mar 2024 • Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

We fit scaling laws that extrapolate in both the number of model parameters and the ratio of training tokens to parameters.

Language Modelling

Paper
Code

ContactHandover: Contact-Guided Robot-to-Human Object Handover

no code implementations • 1 Apr 2024 • Zixi Wang, Zeyi Liu, Nicolas Ouporov, Shuran Song

We propose ContactHandover, a robot to human handover system that consists of two phases: a contact-guided grasping phase and an object delivery phase.

Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.