1 code implementation • ECCV 2020 • Linxi Fan, Shyamal Buch, Guanzhi Wang, Ryan Cao, Yuke Zhu, Juan Carlos Niebles, Li Fei-Fei
We analyze the suitability of our new primitive for video action recognition and explore several novel variations of our approach to enable stronger representational flexibility while maintaining an efficient design.
no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar
Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained state-of-the-art results in image-to-text generation.
no code implementations • 2 Feb 2023 • Cheng-Chun Hsu, Zhenyu Jiang, Yuke Zhu
We demonstrate the effectiveness of our approach in both simulation and real-world scenes.
no code implementations • 8 Dec 2022 • Hanwen Jiang, Zhenyu Jiang, Kristen Grauman, Yuke Zhu
The reconstruction results under predicted poses are comparable to the ones using ground-truth poses.
no code implementations • 15 Nov 2022 • Huihan Liu, Soroush Nasiriany, Lance Zhang, Zhiyao Bao, Yuke Zhu
To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work.
no code implementations • 20 Oct 2022 • Soroush Nasiriany, Tian Gao, Ajay Mandlekar, Yuke Zhu
Imitation learning offers a promising path for robots to learn general-purpose behaviors, but traditionally has exhibited limited scalability due to high data supervision requirements and brittle generalization.
2 code implementations • 6 Oct 2022 • Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan
This work shows that we can express a wide spectrum of robot manipulation tasks with multimodal prompts, interleaving textual and visual tokens.
no code implementations • 19 Sep 2022 • Mingyo Seo, Ryan Gupta, Yifeng Zhu, Alexy Skoutnev, Luis Sentis, Yuke Zhu
We present a hierarchical learning framework, named PRELUDE, which decomposes the problem of perceptive locomotion into high-level decision-making to predict navigation commands and low-level gait generation to realize the target commands.
no code implementations • 27 Jun 2022 • Zizhao Wang, Xuesu Xiao, Zifan Xu, Yuke Zhu, Peter Stone
Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states.
1 code implementation • 17 Jun 2022 • Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar
Autonomous agents have made great strides in specialist domains like Atari games and Go.
1 code implementation • CVPR 2022 • Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Anima Anandkumar
A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts.
Ranked #1 on
Few-Shot Image Classification
on Bongard-HOI
1 code implementation • CVPR 2022 • Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu
To evaluate our model, we develop AutoCastSim, a network-augmented driving simulation framework with example accident-prone scenarios.
1 code implementation • ICLR 2022 • Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar
This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.
Ranked #1 on
Zero-Shot Human-Object Interaction Detection
on HICO
no code implementations • 14 Mar 2022 • Bokui Shen, Zhenyu Jiang, Christopher Choy, Leonidas J. Guibas, Silvio Savarese, Anima Anandkumar, Yuke Zhu
Manipulating volumetric deformable objects in the real world, like plush toys and pizza dough, bring substantial challenges due to infinite shape variations, non-rigid motions, and partial observability.
no code implementations • CVPR 2022 • Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu
We also apply Ditto to real-world objects and deploy the recreated digital twins in physical simulation.
1 code implementation • 3 Feb 2022 • Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu
Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.
no code implementations • 15 Nov 2021 • Youngwoon Lee, Joseph J. Lim, Anima Anandkumar, Yuke Zhu
However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences.
no code implementations • 27 Oct 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar
We present an extended abstract for the previously published work TESSERACT [Mahajan et al., 2021], which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 7 Oct 2021 • Soroush Nasiriany, Huihan Liu, Yuke Zhu
Realistic manipulation tasks require a robot to interact with an environment with a prolonged sequence of motor actions.
1 code implementation • 2 Oct 2021 • Josiah Wong, Viktor Makoviychuk, Anima Anandkumar, Yuke Zhu
Operational Space Control (OSC) has been used as an effective task-space controller for manipulation.
no code implementations • 28 Sep 2021 • Yifeng Zhu, Peter Stone, Yuke Zhu
From the task structures of multi-task demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning.
1 code implementation • 6 Aug 2021 • Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, Roberto Martín-Martín
Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation.
2 code implementations • 15 Jul 2021 • Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency
In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
no code implementations • 26 Jun 2021 • Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei
To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks.
Hierarchical Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 17 Jun 2021 • Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar
A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert.
no code implementations • 31 May 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar
Algorithms derived from Tesseract decompose the Q-tensor across agents and utilise low-rank tensor approximations to model agent interactions relevant to the task.
1 code implementation • 18 May 2021 • Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Animashree Anandkumar
Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
4 code implementations • ICCV 2021 • Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar
We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision.
Box-supervised Instance Segmentation
Semantic correspondence
+1
1 code implementation • 4 Apr 2021 • Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yuke Zhu
The experimental results in simulation and on the real robot have demonstrated that the use of implicit neural representations and joint learning of grasp affordance and 3D reconstruction have led to state-of-the-art grasping results.
1 code implementation • CVPR 2021 • Yifan Sun, Yuke Zhu, Yuhan Zhang, Pengkun Zheng, Xi Qiu, Chi Zhang, Yichen Wei
%We argue that such flexibility is also important for deep metric learning, because different visual concepts indeed correspond to different semantic scales.
Ranked #2 on
Metric Learning
on DyML-Animal
no code implementations • 1 Jan 2021 • Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Anima Anandkumar
The performance of our method is comparable or even better than the setting where all players have a full view of the environment, but no coach.
no code implementations • 22 Dec 2020 • Xinlei Pan, Animesh Garg, Animashree Anandkumar, Yuke Zhu
Through experimentation and comparative study, we demonstrate the effectiveness of our approach in discovering robust and cost-efficient hand morphologies for grasping novel objects.
no code implementations • 12 Dec 2020 • Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese
We develop a simple and effective algorithm to train the policy iteratively on new data collected by the system that encourages the policy to learn how to traverse bottlenecks through the interventions.
no code implementations • 12 Dec 2020 • Albert Tung, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese
To address these challenges, we present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms and collect demonstrations for multi-arm tasks.
no code implementations • 1 Dec 2020 • Michelle A. Lee, Matthew Tan, Yuke Zhu, Jeannette Bohg
Using sensor data from multiple modalities presents an opportunity to encode redundant and complementary features that can be useful when one modality is corrupted or noisy.
no code implementations • 16 Nov 2020 • Guanya Shi, Yifeng Zhu, Jonathan Tremblay, Stan Birchfield, Fabio Ramos, Animashree Anandkumar, Yuke Zhu
Deep learning-based object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer.
1 code implementation • NeurIPS 2020 • Weili Nie, Zhiding Yu, Lei Mao, Ankit B. Patel, Yuke Zhu, Animashree Anandkumar
Inspired by the original one hundred BPs, we propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning.
6 code implementations • 25 Sep 2020 • Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Abhishek Joshi, Soroush Nasiriany, Yifeng Zhu
robosuite is a simulation framework for robot learning powered by the MuJoCo physics engine.
no code implementations • 21 Sep 2020 • Xingye Da, Zhaoming Xie, David Hoeller, Byron Boots, Animashree Anandkumar, Yuke Zhu, Buck Babich, Animesh Garg
We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago).
1 code implementation • 17 Aug 2020 • Hongyu Ren, Yuke Zhu, Jure Leskovec, Anima Anandkumar, Animesh Garg
We propose a variational inference framework OCEAN to perform online task inference for compositional tasks.
no code implementations • ECCV 2020 • Yuke Zhu, Yan Bai, Yichen Wei
Consequently, the feature transform is performed by a rotation that respects the spherical data distributions.
no code implementations • ICLR 2021 • Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei
To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.
no code implementations • 11 Nov 2019 • Ajay Mandlekar, Jonathan Booher, Max Spero, Albert Tung, Anchit Gupta, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei
We evaluate the quality of our platform, the diversity of demonstrations in our dataset, and the utility of our dataset via quantitative and qualitative analysis.
no code implementations • 29 Oct 2019 • Kuan Fang, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei
The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal.
no code implementations • 26 Oct 2019 • Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei, Silvio Savarese
For this purpose, we present KETO, a framework of learning keypoint representations of tool-based manipulation.
Robotics
2 code implementations • 23 Oct 2019 • Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu
We present 6-PACK, a deep learning approach to category-level 6D object pose tracking on RGB-D data.
Ranked #1 on
6D Pose Estimation using RGBD
on REAL275
(Rerr metric)
2 code implementations • 3 Oct 2019 • Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei
Causal reasoning has been an indispensable capability for humans and other intelligent animals to interact with the physical world.
1 code implementation • NeurIPS 2019 • Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei
Recent learning-to-plan methods have shown promising results on planning directly from observation space.
1 code implementation • 28 Sep 2019 • Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum
A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty.
no code implementations • 27 Sep 2019 • Linxi Fan, Yuke Zhu, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei
We present an overview of SURREAL-System, a reproducible, flexible, and scalable framework for distributed reinforcement learning (RL).
no code implementations • ICCV 2019 • Bokui Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese
A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities.
no code implementations • 16 Aug 2019 • De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles
The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures.
1 code implementation • 28 Jul 2019 • Michelle A. Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg
Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback.
8 code implementations • CVPR 2019 • Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese
A key technical challenge in performing 6D object pose estimation from RGB-D image is to fully leverage the two complementary data sources.
Ranked #4 on
6D Pose Estimation
on LineMOD
no code implementations • 7 Nov 2018 • Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, Li Fei-Fei
Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification.
2 code implementations • 24 Oct 2018 • Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg
Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback.
no code implementations • CVPR 2019 • De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles
We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model.
no code implementations • 25 Jun 2018 • Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, Silvio Savarese
We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering.
1 code implementation • ICLR 2018 • Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, János Kramár, Raia Hadsell, Nando de Freitas, Nicolas Heess
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent.
1 code implementation • 14 Dec 2017 • Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, Aniruddha Kembhavi, Abhinav Gupta, Ali Farhadi
We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor. allenai. org.
1 code implementation • 4 Oct 2017 • Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese
In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction.
no code implementations • CVPR 2017 • Yuke Zhu, Joseph J. Lim, Li Fei-Fei
Humans possess an extraordinary ability to learn new skills and new knowledge for problem solving.
no code implementations • ICCV 2017 • Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi
A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world.
3 code implementations • CVPR 2017 • Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei
In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image.
Ranked #5 on
Panoptic Scene Graph Generation
on PSG Dataset
2 code implementations • 16 Sep 2016 • Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi
To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.
no code implementations • 23 Feb 2016 • Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Fei-Fei Li
Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering.
no code implementations • CVPR 2016 • Yuke Zhu, Oliver Groth, Michael Bernstein, Li Fei-Fei
It enables a new type of QA with visual answers, in addition to textual answers used in previous work.
no code implementations • ICCV 2015 • Tian Lan, Yuke Zhu, Amir Roshan Zamir, Silvio Savarese
Realistic videos of human actions exhibit rich spatiotemporal structures at multiple levels of granularity: an action can always be decomposed into multiple finer-grained elements in both space and time.
no code implementations • 20 Jul 2015 • Yuke Zhu, Ce Zhang, Christopher Ré, Li Fei-Fei
The complexity of the visual world creates significant challenges for comprehensive visual understanding.