Search Results for author: Yuke Zhu

Realistic videos of human actions exhibit rich spatiotemporal structures at multiple levels of granularity: an action can always be decomposed into multiple finer-grained elements in both space and time.

Action Parsing Action Recognition +2

Paper
Add Code

Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision

no code implementations • 25 Jun 2018 • Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, Silvio Savarese

We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering.

Paper
Add Code

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

no code implementations • CVPR 2019 • De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles

We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model.

Paper
Add Code

RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

no code implementations • 7 Nov 2018 • Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, Li Fei-Fei

Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification.

Imitation Learning

Paper
Add Code

Knowledge Acquisition for Visual Question Answering via Iterative Querying

no code implementations • CVPR 2017 • Yuke Zhu, Joseph J. Lim, Li Fei-Fei

Humans possess an extraordinary ability to learn new skills and new knowledge for problem solving.

Question Answering Visual Question Answering

Paper
Add Code

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

no code implementations • 16 Aug 2019 • De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles

The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures.

Imitation Learning

Paper
Add Code

Situational Fusion of Visual Representation for Visual Navigation

no code implementations • ICCV 2019 • Bokui Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese

A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities.

Visual Navigation

Paper
Add Code

SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning

no code implementations • 27 Sep 2019 • Linxi Fan, Yuke Zhu, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei

We present an overview of SURREAL-System, a reproducible, flexible, and scalable framework for distributed reinforcement learning (RL).

OpenAI Gym reinforcement-learning +2

Paper
Add Code

Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation

no code implementations • 29 Oct 2019 • Kuan Fang, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal.

Variational Inference

Paper
Add Code

KETO: Learning Keypoint Representations for Tool Manipulation

no code implementations • 26 Oct 2019 • Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei, Silvio Savarese

For this purpose, we present KETO, a framework of learning keypoint representations of tool-based manipulation.

Robotics

Paper
Add Code

Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity

no code implementations • 11 Nov 2019 • Ajay Mandlekar, Jonathan Booher, Max Spero, Albert Tung, Anchit Gupta, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

We evaluate the quality of our platform, the diversity of demonstrations in our dataset, and the utility of our dataset via quantitative and qualitative analysis.

Robot Manipulation

Paper
Add Code

Adaptive Procedural Task Generation for Hard-Exploration Problems

no code implementations • ICLR 2021 • Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.

Paper
Add Code

Spherical Feature Transform for Deep Metric Learning

no code implementations • ECCV 2020 • Yuke Zhu, Yan Bai, Yichen Wei

Consequently, the feature transform is performed by a rotation that respects the spherical data distributions.

Data Augmentation Metric Learning +2

Paper
Add Code

Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion

no code implementations • 21 Sep 2020 • Xingye Da, Zhaoming Xie, David Hoeller, Byron Boots, Animashree Anandkumar, Yuke Zhu, Buck Babich, Animesh Garg

We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Coach-Player Framework for Dynamic Team Composition

no code implementations • 1 Jan 2021 • Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Anima Anandkumar

The performance of our method is comparable or even better than the setting where all players have a full view of the environment, but no coach.

Zero-shot Generalization

Paper
Add Code

Fast Uncertainty Quantification for Deep Object Pose Estimation

no code implementations • 16 Nov 2020 • Guanya Shi, Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Fabio Ramos, Animashree Anandkumar, Yuke Zhu

Deep learning-based object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer.

Object Pose Estimation +1

Paper
Add Code

Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors

no code implementations • 1 Dec 2020 • Michelle A. Lee, Matthew Tan, Yuke Zhu, Jeannette Bohg

Using sensor data from multiple modalities presents an opportunity to encode redundant and complementary features that can be useful when one modality is corrupted or noisy.

valid

Paper
Add Code

Human-in-the-Loop Imitation Learning using Remote Teleoperation

no code implementations • 12 Dec 2020 • Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

We develop a simple and effective algorithm to train the policy iteratively on new data collected by the system that encourages the policy to learn how to traverse bottlenecks through the interventions.

Imitation Learning Robot Manipulation

Paper
Add Code

Learning Multi-Arm Manipulation Through Collaborative Teleoperation

no code implementations • 12 Dec 2020 • Albert Tung, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

To address these challenges, we present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms and collect demonstrations for multi-arm tasks.

Imitation Learning

Paper
Add Code

Emergent Hand Morphology and Control from Optimizing Robust Grasps of Diverse Objects

no code implementations • 22 Dec 2020 • Xinlei Pan, Animesh Garg, Animashree Anandkumar, Yuke Zhu

Through experimentation and comparative study, we demonstrate the effectiveness of our approach in discovering robust and cost-efficient hand morphologies for grasping novel objects.

Bayesian Optimization MORPH

Paper
Add Code

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

no code implementations • 31 May 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

Algorithms derived from Tesseract decompose the Q-tensor across agents and utilise low-rank tensor approximations to model agent interactions relevant to the task.

Learning Theory Multi-agent Reinforcement Learning +3

Paper
Add Code

Discovering Generalizable Skills via Automated Generation of Diverse Tasks

no code implementations • 26 Jun 2021 • Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation

no code implementations • 28 Sep 2021 • Yifeng Zhu, Peter Stone, Yuke Zhu

From the task structures of multi-task demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning.

Imitation Learning Robot Manipulation

Paper
Add Code

Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

no code implementations • 27 Oct 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

We present an extended abstract for the previously published work TESSERACT [Mahajan et al., 2021], which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization

no code implementations • 15 Nov 2021 • Youngwoon Lee, Joseph J. Lim, Anima Anandkumar, Yuke Zhu

However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences.

Reinforcement Learning (RL) Robot Manipulation

Paper
Add Code

Ditto: Building Digital Twins of Articulated Objects from Interaction

no code implementations • CVPR 2022 • Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu

We also apply Ditto to real-world objects and deploy the recreated digital twins in physical simulation.

Mixed Reality Object

Paper
Add Code

ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation

no code implementations • 14 Mar 2022 • Bokui Shen, Zhenyu Jiang, Christopher Choy, Leonidas J. Guibas, Silvio Savarese, Anima Anandkumar, Yuke Zhu

Manipulating volumetric deformable objects in the real world, like plush toys and pizza dough, bring substantial challenges due to infinite shape variations, non-rigid motions, and partial observability.

Contrastive Learning Deformable Object Manipulation

Paper
Add Code

Learning and Retrieval from Prior Data for Skill-based Imitation Learning

no code implementations • 20 Oct 2022 • Soroush Nasiriany, Tian Gao, Ajay Mandlekar, Yuke Zhu

Imitation learning offers a promising path for robots to learn general-purpose behaviors, but traditionally has exhibited limited scalability due to high data supervision requirements and brittle generalization.

Data Augmentation Imitation Learning +2

Paper
Add Code

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment

no code implementations • 15 Nov 2022 • Huihan Liu, Soroush Nasiriany, Lance Zhang, Zhiyao Bao, Yuke Zhu

To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work.

Decision Making

Paper
Add Code

Few-View Object Reconstruction with Unknown Categories and Camera Poses

no code implementations • 8 Dec 2022 • Hanwen Jiang, Zhenyu Jiang, Kristen Grauman, Yuke Zhu

The reconstruction results under predicted poses are comparable to the ones using ground-truth poses.

Object Object Reconstruction +1

Paper
Add Code

Ditto in the House: Building Articulation Models of Indoor Scenes through Interactive Perception

no code implementations • 2 Feb 2023 • Cheng-Chun Hsu, Zhenyu Jiang, Yuke Zhu

We demonstrate the effectiveness of our approach in both simulation and real-world scenes.

Robot Navigation

Paper
Add Code

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar

Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained the state-of-the-art results in image-to-text generation.

Few-Shot Learning Image Captioning +3

Paper
Add Code

Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids

no code implementations • CVPR 2023 • Wei Dong, Chris Choy, Charles Loop, Or Litany, Yuke Zhu, Anima Anandkumar

To apply this representation to monocular scene reconstruction, we develop a scale calibration algorithm for fast geometric initialization from monocular depth priors.

Indoor Scene Reconstruction

Paper
Add Code

Granger-Causal Hierarchical Skill Discovery

no code implementations • 15 Jun 2023 • Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum

Reinforcement Learning (RL) has demonstrated promising results in learning policies for complex tasks, but it often suffers from low sample efficiency and limited transferability.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Doduo: Learning Dense Visual Correspondence from Unsupervised Semantic-Aware Flow

no code implementations • 26 Sep 2023 • Zhenyu Jiang, Hanwen Jiang, Yuke Zhu

Incorporating semantic priors with self-supervised flow training, Doduo produces accurate dense correspondence robust to the dynamic changes of the scenes.

Paper
Add Code

Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

no code implementations • 22 Oct 2023 • Yifeng Zhu, Zhenyu Jiang, Peter Stone, Yuke Zhu

We introduce GROOT, an imitation learning method for learning robust policies with object-centric and 3D priors.

Imitation Learning Object

Paper
Add Code

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

no code implementations • 26 Oct 2023 • Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, Dieter Fox

Imitation learning from a large set of human demonstrations has proved to be an effective paradigm for building capable robot agents.

Imitation Learning

Paper
Add Code

Interactive Robot Learning from Verbal Correction

no code implementations • 26 Oct 2023 • Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng

A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.

Language Modelling Large Language Model

Paper
Add Code

Model-Based Runtime Monitoring with Interactive Imitation Learning

no code implementations • 26 Oct 2023 • Huihan Liu, Shivin Dass, Roberto Martín-Martín, Yuke Zhu

Unlike prior work that cannot foresee future failures or requires failure experiences for training, our method learns a latent-space dynamics model and a failure classifier, enabling our method to simulate future action outcomes and detect out-of-distribution and high-risk states preemptively.

Imitation Learning

Paper
Add Code

LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery

no code implementations • 3 Nov 2023 • Weikang Wan, Yifeng Zhu, Rutav Shah, Yuke Zhu

We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan.

Imitation Learning Robot Manipulation +1

Paper
Add Code

Edge Wasserstein Distance Loss for Oriented Object Detection

no code implementations • 12 Dec 2023 • Yuke Zhu, Yumeng Ruan, Zihua Xiong, Sheng Guo

Differing from exploited the Gaussian distribution to get analytical form of distance measure, we propose a novel oriented regression loss, Wasserstein Distance(EWD) loss, to alleviate the square-like problem.

Object object-detection +3

Paper
Add Code

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

no code implementations • 23 Jan 2024 • Zizhao Wang, Caroline Wang, Xuesu Xiao, Yuke Zhu, Peter Stone

Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

no code implementations • 12 Feb 2024 • Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

In each iteration, the image is annotated with a visual representation of proposals that the VLM can refer to (e. g., candidate robot actions, localizations, or trajectories).

Instruction Following Logical Reasoning +3

Paper
Add Code

PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning

no code implementations • 1 Mar 2024 • Tian Gao, Soroush Nasiriany, Huihan Liu, Quantao Yang, Yuke Zhu

Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors.

Imitation Learning

Paper
Add Code

OCEAN: Online Task Inference for Compositional Tasks with Context Adaptation

1 code implementation • 17 Aug 2020 • Hongyu Ren, Yuke Zhu, Jure Leskovec, Anima Anandkumar, Animesh Garg

We propose a variational inference framework OCEAN to perform online task inference for compositional tasks.

Variational Inference

Paper
Code

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

1 code implementation • ICLR 2018 • Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, János Kramár, Raia Hadsell, Nando de Freitas, Nicolas Heess

We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent.

Imitation Learning reinforcement-learning +1

Paper
Code

Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

1 code implementation • 4 Oct 2017 • Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese

In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction.

Few-Shot Learning Program induction +1

Paper
Code

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

2 code implementations • 16 Sep 2016 • Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.

3D Reconstruction Feature Engineering +3

Paper
Code

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

1 code implementation • 28 Sep 2019 • Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum

A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty.

Continuous Control

Paper
Code

Causal Induction from Visual Observations for Goal Directed Tasks

2 code implementations • 3 Oct 2019 • Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Causal reasoning has been an indispensable capability for humans and other intelligent animals to interact with the physical world.

Paper
Code

Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition

1 code implementation • 18 May 2021 • Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Animashree Anandkumar

Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Code

Causal Dynamics Learning for Task-Independent State Abstraction

1 code implementation • 27 Jun 2022 • Zizhao Wang, Xuesu Xiao, Zifan Xu, Yuke Zhu, Peter Stone

Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states.

Model-based Reinforcement Learning

Paper
Code

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

1 code implementation • 28 Jul 2019 • Michelle A. Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback.

Representation Learning Self-Supervised Learning

Paper
Code

Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

1 code implementation • CVPR 2021 • Yifan Sun, Yuke Zhu, Yuhan Zhang, Pengkun Zheng, Xi Qiu, Chi Zhang, Yichen Wei

%We argue that such flexibility is also important for deep metric learning, because different visual concepts indeed correspond to different semantic scales.

Ranked #2 on Metric Learning on DyML-Animal

Metric Learning

Paper
Code

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

1 code implementation • 17 Jun 2021 • Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar

A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert.

Autonomous Driving Image Augmentation +3

Paper
Code

Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

1 code implementation • NeurIPS 2020 • Weili Nie, Zhiding Yu, Lei Mao, Ankit B. Patel, Yuke Zhu, Animashree Anandkumar

Inspired by the original one hundred BPs, we propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning.

Novel Concepts Representation Learning +1

Paper
Code

Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments

1 code implementation • 19 Sep 2022 • Mingyo Seo, Ryan Gupta, Yifeng Zhu, Alexy Skoutnev, Luis Sentis, Yuke Zhu

We present a hierarchical learning framework, named PRELUDE, which decomposes the problem of perceptive locomotion into high-level decision-making to predict navigation commands and low-level gait generation to realize the target commands.

Imitation Learning Reinforcement Learning (RL)

Paper
Code

Regression Planning Networks

1 code implementation • NeurIPS 2019 • Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Recent learning-to-plan methods have shown promising results on planning directly from observation space.

regression

Paper
Code

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

1 code implementation • CVPR 2022 • Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts.

Ranked #1 on Few-Shot Image Classification on Bongard-HOI

Benchmarking Few-Shot Image Classification +5

Paper
Code

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents

1 code implementation • 15 Oct 2023 • Jake Grigsby, Linxi Fan, Yuke Zhu

We introduce AMAGO, an in-context Reinforcement Learning (RL) agent that uses sequence models to tackle the challenges of generalization, long-term memory, and meta-learning.

In-Context Learning Meta-Learning +2

Paper
Code

RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

1 code implementation • ICLR 2022 • Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.

Ranked #1 on Zero-Shot Human-Object Interaction Detection on HICO

Human-Object Interaction Detection Object +5

Paper
Code

Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks

1 code implementation • 7 Oct 2021 • Soroush Nasiriany, Huihan Liu, Yuke Zhu

Realistic manipulation tasks require a robot to interact with an environment with a prolonged sequence of motor actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles

1 code implementation • CVPR 2022 • Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu

To evaluate our model, we develop AutoCastSim, a network-augmented driving simulation framework with example accident-prone scenarios.

Autonomous Driving

Paper
Code

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

2 code implementations • 24 Oct 2018 • Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback.

Reinforcement Learning (RL) Self-Supervised Learning

Paper
Code

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

3 code implementations • ICCV 2021 • Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar

We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision.

Ranked #1 on Weakly-supervised instance segmentation on COCO 2017 val

Box-supervised Instance Segmentation Segmentation +2

Paper
Code

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition

1 code implementation • ECCV 2020 • Linxi Fan, Shyamal Buch, Guanzhi Wang, Ryan Cao, Yuke Zhu, Juan Carlos Niebles, Li Fei-Fei

We analyze the suitability of our new primitive for video action recognition and explore several novel variations of our approach to enable stronger representational flexibility while maintaining an efficient design.

Action Recognition Temporal Action Localization +1

Paper
Code

OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

1 code implementation • 2 Oct 2021 • Josiah Wong, Viktor Makoviychuk, Anima Anandkumar, Yuke Zhu

Operational Space Control (OSC) has been used as an effective task-space controller for manipulation.

Robot Manipulation

103

Paper
Code

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

1 code implementation • 4 Apr 2021 • Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yuke Zhu

The experimental results in simulation and on the real robot have demonstrated that the use of implicit neural representations and joint learning of grasp affordance and 3D reconstruction have led to state-of-the-art grasping results.

3D Reconstruction Multi-Task Learning

113

Paper
Code

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

1 code implementation • 23 Feb 2016 • Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Fei-Fei Li

Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering.

Image Classification Question Answering

211

Paper
Code

6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints

2 code implementations • 23 Oct 2019 • Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu

We present 6-PACK, a deep learning approach to category-level 6D object pose tracking on RGB-D data.

Ranked #1 on 6D Pose Estimation using RGBD on REAL275 (Rerr metric)

6D Pose Estimation 6D Pose Estimation using RGBD +2

286

Paper
Code

Scene Graph Generation by Iterative Message Passing

5 code implementations • CVPR 2017 • Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei

In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image.

Ranked #9 on Panoptic Scene Graph Generation on PSG Dataset

Graph Generation Panoptic Scene Graph Generation

376

Paper
Code

Pre-Trained Language Models for Interactive Decision-Making

1 code implementation • 3 Feb 2022 • Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

Imitation Learning Language Modelling

390

Paper
Code

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

1 code implementation • 6 Aug 2021 • Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, Roberto Martín-Martín

Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation.

Imitation Learning reinforcement-learning +2

426

Paper
Code

VIMA: General Robot Manipulation with Multimodal Prompts

2 code implementations • 6 Oct 2022 • Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan

We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens.

Imitation Learning Language Modelling +3

680

Paper
Code

AI2-THOR: An Interactive 3D Environment for Visual AI

2 code implementations • 14 Dec 2017 • Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, Aniruddha Kembhavi, Abhinav Gupta, Ali Farhadi

We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor. allenai. org.

Imitation Learning Navigate +7

1,032

Paper
Code

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

8 code implementations • CVPR 2019 • Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese

A key technical challenge in performing 6D object pose estimation from RGB-D image is to fully leverage the two complementary data sources.

Ranked #4 on 6D Pose Estimation on LineMOD

6D Pose Estimation 6D Pose Estimation using RGBD +1

1,041

Paper
Code

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

6 code implementations • 25 Sep 2020 • Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Abhishek Joshi, Soroush Nasiriany, Yifeng Zhu

robosuite is a simulation framework for robot learning powered by the MuJoCo physics engine.

Gesture Generation

1,087

Paper
Code

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

1 code implementation • 17 Jun 2022 • Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar

Autonomous agents have made great strides in specialist domains like Atari games and Go.

Atari Games

1,671

Paper
Code

Eureka: Human-Level Reward Design via Coding Large Language Models

1 code implementation • 19 Oct 2023 • Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar

The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating.

Decision Making In-Context Learning +1

2,602

Paper
Code

Voyager: An Open-Ended Embodied Agent with Large Language Models

1 code implementation • 25 May 2023 • Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar

We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention.

5,170

Paper
Code

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

2 code implementations • 15 Jul 2021 • Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency

In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.

Representation Learning

5,458

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.