Search Results for author: Andy Zeng

Found 44 papers, 22 papers with code

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

no code implementations • 12 Feb 2024 • Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

In each iteration, the image is annotated with a visual representation of proposals that the VLM can refer to (e. g., candidate robot actions, localizations, or trajectories).

Instruction Following Logical Reasoning +3

Paper
Add Code

Real-World Robot Applications of Foundation Models: A Review

no code implementations • 8 Feb 2024 • Kento Kawaharazuka, Tatsuya Matsushima, Andrew Gambardella, Jiaxian Guo, Chris Paxton, Andy Zeng

This paper provides an overview of the practical application of foundation models in real-world robotics, with a primary emphasis on the replacement of specific components within existing robot systems.

Motion Planning

Paper
Add Code

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

no code implementations • 7 Dec 2023 • Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, Brian Ichter

For example, consider prompting an LM to write code that counts the number of times it detects sarcasm in an essay: the LM may struggle to write an implementation for "detect_sarcasm(string)" that can be executed by the interpreter (handling the edge cases would be insurmountable).

Language Modelling

Paper
Add Code

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

1 code implementation • 17 Nov 2023 • Lihan Zha, Yuchen Cui, Li-Heng Lin, Minae Kwon, Montserrat Gonzalez Arenas, Andy Zeng, Fei Xia, Dorsa Sadigh

DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives.

Language Modelling Large Language Model +1

Paper
Code

Video Language Planning

no code implementations • 16 Oct 2023 • Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data.

Paper
Add Code

Large Language Models as General Pattern Machines

no code implementations • 10 Jul 2023 • Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng

We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstraction and Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art.

In-Context Learning

Paper
Add Code

Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

no code implementations • 4 Jul 2023 • Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, Anirudha Majumdar

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions.

Conformal Prediction Language Modelling +1

Paper
Add Code

Rearrangement Planning for General Part Assembly

no code implementations • 1 Jul 2023 • Yulong Li, Andy Zeng, Shuran Song

Most successes in autonomous robotic assembly have been restricted to single target or category.

Paper
Add Code

Language to Rewards for Robotic Skill Synthesis

no code implementations • 14 Jun 2023 • Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia

However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot.

In-Context Learning Logical Reasoning

Paper
Add Code

Modular Visual Question Answering via Code Generation

1 code implementation • 8 Jun 2023 • Sanjay Subramanian, Medhini Narasimhan, Kushal Khangaonkar, Kevin Yang, Arsha Nagrani, Cordelia Schmid, Andy Zeng, Trevor Darrell, Dan Klein

We present a framework that formulates visual question answering as modular code generation.

Code Generation In-Context Learning +2

Paper
Code

Financial sentiment analysis using FinBERT with application in predicting stock movement

no code implementations • 3 Jun 2023 • Tingsong Jiang, Andy Zeng

We apply sentiment analysis in financial context using FinBERT, and build a deep neural network model based on LSTM to predict the movement of financial market movement.

Sentiment Analysis

Paper
Add Code

TidyBot: Personalized Robot Assistance with Large Language Models

1 code implementation • 9 May 2023 • Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, Thomas Funkhouser

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios.

483

Paper
Code

RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning

1 code implementation • 9 Apr 2023 • Kevin Zakka, Philipp Wu, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel

Replicating human-like dexterity in robot hands represents one of the largest open problems in robotics.

Benchmarking Multi-Task Learning +2

530

Paper
Code

Audio Visual Language Maps for Robot Navigation

no code implementations • 13 Mar 2023 • Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments.

Navigate Robot Navigation

Paper
Add Code

PaLM-E: An Embodied Multimodal Language Model

2 code implementations • 6 Mar 2023 • Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence

Large language models excel at a wide range of complex tasks.

Ranked #2 on Visual Question Answering (VQA) on OK-VQA

Language Modelling Large Language Model +2

201

Paper
Code

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

no code implementations • NeurIPS 2023 • Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, Brian Ichter

Recent progress in large language models (LLMs) has demonstrated the ability to learn and leverage Internet-scale knowledge through pre-training with autoregressive models.

Language Modelling Text Generation

Paper
Add Code

Visual Language Maps for Robot Navigation

1 code implementation • 11 Oct 2022 • Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e. g., image captions).

3D Reconstruction Image Captioning +1

283

Paper
Code

Inner Monologue: Embodied Reasoning through Planning with Language Models

no code implementations • 12 Jul 2022 • Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction.

Paper
Add Code

Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based Robotics Research

no code implementations • 21 Apr 2022 • Ryan Hoque, Kaushik Shivakumar, Shrey Aeron, Gabriel Deza, Aditya Ganapathi, Adrian Wong, Johnny Lee, Andy Zeng, Vincent Vanhoucke, Ken Goldberg

Autonomous fabric manipulation is a longstanding challenge in robotics, but evaluating progress is difficult due to the cost and diversity of robot hardware.

Benchmarking Imitation Learning

Paper
Add Code

Learning Pneumatic Non-Prehensile Manipulation with a Mobile Blower

1 code implementation • 5 Apr 2022 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

We investigate pneumatic non-prehensile manipulation (i. e., blowing) as a means of efficiently moving scattered objects into a target receptacle.

Paper
Code

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

3 code implementations • 4 Apr 2022 • Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, Andy Zeng

We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment.

Decision Making Language Modelling +1

159

Paper
Code

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 code implementation • 1 Apr 2022 • Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence

Large pretrained (e. g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on.

Ranked #21 on Video Retrieval on MSR-VTT-1kA (video-to-text R@1 metric)

Image Captioning Multimodal Reasoning +5

32,775

Paper
Code

Multiscale Sensor Fusion and Continuous Control with Neural CDEs

no code implementations • 16 Mar 2022 • Sumeet Singh, Francis McCann Ramirez, Jacob Varley, Andy Zeng, Vikas Sindhwani

Though robot learning is often formulated in terms of discrete-time Markov decision processes (MDPs), physical robots require near-continuous multiscale feedback control.

Continuous Control Sensor Fusion

Paper
Add Code

Hybrid Random Features

1 code implementation • ICLR 2022 • Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.

Benchmarking

Paper
Code

Multi-Task Learning with Sequence-Conditioned Transporter Networks

no code implementations • 15 Sep 2021 • Michael H. Lim, Andy Zeng, Brian Ichter, Maryam Bandari, Erwin Coumans, Claire Tomlin, Stefan Schaal, Aleksandra Faust

Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications.

Multi-Task Learning

Paper
Add Code

Implicit Behavioral Cloning

4 code implementations • 1 Sep 2021 • Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson

We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models.

D4RL

2,523

Paper
Code

Learning to See before Learning to Act: Visual Pre-training for Manipulation

no code implementations • 1 Jul 2021 • Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin

With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.

Transfer Learning

Paper
Add Code

XIRL: Cross-embodiment Inverse Reinforcement Learning

1 code implementation • 7 Jun 2021 • Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette Bohg, Debidatta Dwibedi

We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments -- shape, actions, end-effector dynamics, etc.

reinforcement-learning Reinforcement Learning (RL)

32,776

Paper
Code

Spatial Intention Maps for Multi-Agent Mobile Manipulation

1 code implementation • 23 Mar 2021 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

The ability to communicate intention enables decentralized multi-agent robots to collaborate while performing physical tasks.

Paper
Code

Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

no code implementations • 6 Dec 2020 • Daniel Seita, Pete Florence, Jonathan Tompson, Erwin Coumans, Vikas Sindhwani, Ken Goldberg, Andy Zeng

Goals cannot be as easily specified as rigid object poses, and may involve complex relative spatial relations such as "place the item inside the bag".

Paper
Add Code

Spatial Action Maps for Mobile Manipulation

1 code implementation • 20 Apr 2020 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e. g., step forward, turn left, turn right, etc.)

Q-Learning Value prediction

Paper
Code

Grasping in the Wild:Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations

no code implementations • 9 Dec 2019 • Shuran Song, Andy Zeng, Johnny Lee, Thomas Funkhouser

A key aspect of our grasping model is that it uses "action-view" based rendering to simulate future states with respect to different possible actions.

Paper
Add Code

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

1 code implementation • 30 Oct 2019 • Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song

This formulation enables the model to acquire a broader understanding of how shapes and surfaces fit together for assembly -- allowing it to generalize to new objects and kits.

Object Pose Estimation

Paper
Code

ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation

1 code implementation • 6 Oct 2019 • Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng, Shuran Song

To address these challenges, we present ClearGrasp -- a deep learning approach for estimating accurate 3D geometry of transparent objects from a single RGB-D image for robotic manipulation.

Ranked #1 on Semantic Segmentation on Cleargrasp (Novel)

Depth Completion Monocular Depth Estimation +4

267

Paper
Code

DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions

no code implementations • 10 Jun 2019 • Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B. Tenenbaum, Shuran Song

We study the problem of learning physical object representations for robot manipulation.

Friction Object +1

Paper
Add Code

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

no code implementations • 27 Mar 2019 • Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error.

Friction

Paper
Add Code

Im2Pano3D: Extrapolating 360Â° Structure and Semantics Beyond the Field of View

no code implementations • CVPR 2018 • Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation ( <=50%) in the form of an RGB-D image.

Paper
Add Code

Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning

4 code implementations • 27 Mar 2018 • Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

Skilled robotic manipulation benefits from complex synergies between non-prehensile (e. g. pushing) and prehensile (e. g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free.

Q-Learning reinforcement-learning +1

851

Paper
Code

Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

no code implementations • 12 Dec 2017 • Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

Paper
Add Code

Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

3 code implementations • 3 Oct 2017 • Andy Zeng, Shuran Song, Kuan-Ting Yu, Elliott Donlon, Francois R. Hogan, Maria Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, Nima Fazeli, Ferran Alet, Nikhil Chavan Dafle, Rachel Holladay, Isabella Morona, Prem Qu Nair, Druck Green, Ian Taylor, Weber Liu, Thomas Funkhouser, Alberto Rodriguez

Since product images are readily available for a wide range of objects (e. g., from the web), the system works out-of-the-box for novel objects without requiring any additional training data.

Image Classification Robotic Grasping

289

Paper
Code

Matterport3D: Learning from RGB-D Data in Indoor Environments

1 code implementation • 18 Sep 2017 • Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, yinda zhang

Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms.

General Classification Scene Understanding +1

892

Paper
Code

Semantic Scene Completion from a Single Depth Image

3 code implementations • CVPR 2017 • Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser

This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation.

Ranked #2 on 3D Semantic Scene Completion on KITTI-360

3D Semantic Scene Completion

1,177

Paper
Code

Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge

2 code implementations • 29 Sep 2016 • Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker Jr., Alberto Rodriguez, Jianxiong Xiao

The approach was part of the MIT-Princeton Team system that took 3rd- and 4th- place in the stowing and picking tasks, respectively at APC 2016.

6D Pose Estimation 6D Pose Estimation using RGBD +1

305

Paper
Code

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

2 code implementations • CVPR 2017 • Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser

To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions.

Ranked #2 on 3D Reconstruction on Scan2CAD

3D Reconstruction Point Cloud Registration

798

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.