no code implementations • ICLR 2019 • Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca Dragan
Our goal is to infer reward functions from demonstrations.
no code implementations • ICML 2020 • Adam Stooke, Joshua Achiam, Pieter Abbeel
This intuition leads to our introduction of PID control for the Lagrange multiplier in constrained RL, which we cast as a dynamical system.
no code implementations • ICML 2020 • Donald Hejna, Lerrel Pinto, Pieter Abbeel
Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning.
1 code implementation • ICML 2020 • Michael Laskin, Pieter Abbeel, Aravind Srinivas
CURL extracts high level features from raw pixels using a contrastive learning objective and performs off-policy control on top of the extracted features.
no code implementations • ICML 2020 • Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak
To solve complex tasks, intelligent agents first need to explore their environments.
no code implementations • 11 Jun 2025 • Yuxuan Kuang, Haoran Geng, Amine Elhafsi, Tan-Dzung Do, Pieter Abbeel, Jitendra Malik, Marco Pavone, Yue Wang
To that end, we introduce SkillBlender, a novel hierarchical reinforcement learning framework for versatile humanoid loco-manipulation.
no code implementations • 4 Jun 2025 • Zhao-Heng Yin, Sherry Yang, Pieter Abbeel
However, how to extract action knowledge (or action representations) from videos for policy learning remains a key challenge.
no code implementations • 3 Jun 2025 • Jialiang Zhang, Haoran Geng, Yang You, Congyue Deng, Pieter Abbeel, Jitendra Malik, Leonidas Guibas
Understanding and predicting articulated actions is important in robot learning.
no code implementations • 2 Jun 2025 • Ademi Adeniji, Zhuoran Chen, Vincent Liu, Venkatesh Pattabiraman, Raunaq Bhirangi, Siddhant Haldar, Pieter Abbeel, Lerrel Pinto
Using a tactile glove to measure contact forces and a vision-based model to estimate hand pose, we train a closed-loop policy that continuously predicts the forces needed for manipulation.
no code implementations • 29 May 2025 • Michal Nauman, Marek Cygan, Carmelo Sferrazza, Aviral Kumar, Pieter Abbeel
Recent advances in language modeling and vision stem from training large models on diverse, multi-task data.
1 code implementation • 29 May 2025 • Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
The resulting framework, CFGRL, is trained with the simplicity of supervised learning, yet can further improve on the policies in the data.
no code implementations • 28 May 2025 • Younggyo Seo, Carmelo Sferrazza, Haoran Geng, Michal Nauman, Zhao-Heng Yin, Pieter Abbeel
Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks.
no code implementations • 26 May 2025 • Vincent Liu, Ademi Adeniji, Haotian Zhan, Siddhant Haldar, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto
Despite recent progress in general purpose robotics, robot policies still lag far behind basic human capabilities in the real world.
1 code implementation • 16 May 2025 • Yuran Wang, Ruihai Wu, Yue Chen, Jiarui Wang, Jiaqi Liang, Ziyu Zhu, Haoran Geng, Jitendra Malik, Pieter Abbeel, Hao Dong
To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO).
no code implementations • 6 May 2025 • Arthur Allshire, Hongsuk Choi, Junyi Zhang, David McAllister, Anthony Zhang, Chung Min Kim, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Angjoo Kanazawa
How can we teach humanoids to climb staircases and sit on chairs using the surrounding environment context?
1 code implementation • 31 Mar 2025 • Aly Lidayan, Yuqing Du, Eliza Kosoy, Maria Rufova, Pieter Abbeel, Alison Gopnik
What drives exploration?
no code implementations • 10 Mar 2025 • Zhao-Heng Yin, Changhao Wang, Luis Pineda, Krishna Bodduluri, Tingfan Wu, Pieter Abbeel, Mustafa Mukadam
We introduce Geometric Retargeting (GeoRT), an ultrafast, and principled neural hand retargeting algorithm for teleoperation, developed as part of our recent Dexterity Gen (DexGen) system.
no code implementations • 5 Mar 2025 • Huang Huang, Fangchen Liu, Letian Fu, Tingfan Wu, Mustafa Mukadam, Jitendra Malik, Ken Goldberg, Pieter Abbeel
Vision-Language-Action (VLA) models aim to predict robotic actions based on visual observations and language instructions.
no code implementations • 14 Feb 2025 • Weirui Ye, Fangchen Liu, Zheng Ding, Yang Gao, Oleh Rybkin, Pieter Abbeel
Finally, we show that the generated simulation data can be scaled up for training a general policy, and it can be transferred back to the real robot in a Real2Sim2Real way.
no code implementations • 6 Feb 2025 • Oleh Rybkin, Michal Nauman, Preston Fu, Charlie Snell, Pieter Abbeel, Sergey Levine, Aviral Kumar
Third, this scaling behavior is enabled by first estimating predictable relationships between hyperparameters, which is used to manage effects of overfitting and plasticity loss unique to RL.
no code implementations • 6 Feb 2025 • Zhao-Heng Yin, Changhao Wang, Luis Pineda, Francois Hogan, Krishna Bodduluri, Akash Sharma, Patrick Lancaster, Ishita Prasad, Mrinal Kalakrishnan, Jitendra Malik, Mike Lambeta, Tingfan Wu, Pieter Abbeel, Mustafa Mukadam
In the real world, we use human teleoperation as a prompt to the controller to produce highly dexterous behavior.
no code implementations • 8 Jan 2025 • Joshua Jones, Oier Mees, Carmelo Sferrazza, Kyle Stachowicz, Pieter Abbeel, Sergey Levine
However, state-of-the-art generalist robot policies are typically trained on large datasets to predict robot actions solely from visual and proprioceptive observations.
no code implementations • 16 Dec 2024 • Bhavya Sukhija, Stelian Coros, Andreas Krause, Pieter Abbeel, Carmelo Sferrazza
In this work, we introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
no code implementations • CVPR 2025 • Huiwon Jang, Sihyun Yu, Jinwoo Shin, Pieter Abbeel, Younggyo Seo
Our experiments show that CoordTok can drastically reduce the number of tokens for encoding long video clips.
no code implementations • 19 Nov 2024 • Younggyo Seo, Pieter Abbeel
Motivated by this, we introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions, i. e., explicitly training the value function to learn the consequence of executing action sequences.
no code implementations • 23 Oct 2024 • Renhao Wang, Kevin Frans, Pieter Abbeel, Sergey Levine, Alexei A. Efros
In this work, we instead propose a prioritized, parametric version of an agent's memory, using generative models to capture online experience.
1 code implementation • 17 Oct 2024 • Jakub Grudzien Kuba, Pieter Abbeel, Sergey Levine
Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems.
2 code implementations • 16 Oct 2024 • Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel
We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps.
no code implementations • 10 Oct 2024 • Wilson Yan, Volodymyr Mnih, Aleksandra Faust, Matei Zaharia, Pieter Abbeel, Hao liu
Efficient video tokenization remains a key bottleneck in learning general purpose vision models that are capable of processing long video sequences.
no code implementations • 12 Sep 2024 • Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik
We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
no code implementations • 12 Aug 2024 • Carmelo Sferrazza, Dun-Ming Huang, Fangchen Liu, Jongmin Lee, Pieter Abbeel
In recent years, the transformer architecture has become the de facto standard for machine learning algorithms applied to natural language processing and computer vision.
no code implementations • 9 Aug 2024 • Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel
We utilize this embedding space and the clustering it supports to self-generate pairings between trajectories in the large unpaired dataset.
no code implementations • 22 Jul 2024 • Zhao-Heng Yin, Pieter Abbeel
As a result, a robot has to learn skills from suboptimal demonstrations and unstructured interactions, which remains a key challenge.
1 code implementation • 17 Jul 2024 • Vint Lee, Minh Nguyen, Leena Elzeiny, Chun Deng, Pieter Abbeel, John Wawrzynek
Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2D chip.
no code implementations • 11 Jun 2024 • Huiwon Jang, Dongyoung Kim, Junsu Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo
To tackle this challenge, in this paper, we revisit the idea of stochastic video generation that learns to capture uncertainty in frame prediction and explore its effectiveness for representation learning.
no code implementations • 8 May 2024 • Yide Shentu, Philipp Wu, Aravind Rajeswaran, Pieter Abbeel
This enables LLMs to flexibly communicate goals in the task plan without being entirely constrained by language limitations.
Ranked #9 on
Robot Manipulation
on CALVIN
1 code implementation • 15 Mar 2024 • Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, Pieter Abbeel
Humanoid robots hold great promise in assisting humans in diverse environments and tasks, due to their flexibility and adaptability leveraging human-like morphology.
1 code implementation • 7 Mar 2024 • Nikhil Mishra, Maximilian Sieb, Pieter Abbeel, Xi Chen
Deep learning methods for perception are the cornerstone of many robotic systems.
no code implementations • 5 Mar 2024 • Fangchen Liu, Kuan Fang, Pieter Abbeel, Sergey Levine
Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.
no code implementations • 4 Mar 2024 • Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik
Manipulating objects with two multi-fingered hands has been a long-standing challenge in robotics, due to the contact-rich nature of many manipulation tasks and the complexity inherent in coordinating a high-dimensional bimanual system.
no code implementations • 27 Feb 2024 • Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans
Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.
1 code implementation • 27 Feb 2024 • Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner?
2 code implementations • 15 Feb 2024 • Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer
To create a benchmark, researchers must choose a dataset of forbidden prompts to which a victim model will respond, along with an evaluation method that scores the harmfulness of the victim model's responses.
1 code implementation • 13 Feb 2024 • Hao liu, Wilson Yan, Matei Zaharia, Pieter Abbeel
To address these challenges, we curate a large dataset of diverse videos and books, utilize the Blockwise RingAttention technique to scalably train on long sequences, and gradually increase context size from 4K to 1M tokens.
no code implementations • 30 Jan 2024 • Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath
Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing.
no code implementations • 8 Jan 2024 • Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine
This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set.
1 code implementation • 28 Dec 2023 • Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel
Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning.
1 code implementation • 18 Dec 2023 • Michael Psenka, Alejandro Escontrela, Pieter Abbeel, Yi Ma
However, previous works fail to exploit the score-based structure of diffusion models, and instead utilize a simple behavior cloning term to train the actor, limiting their ability in the actor-critic setting.
no code implementations • 30 Nov 2023 • Wilson Yan, Andrew Brown, Pieter Abbeel, Rohit Girdhar, Samaneh Azadi
We introduce MoCA, a Motion-Conditioned Image Animation approach for video editing.
1 code implementation • NeurIPS 2023 • Daiki E. Matsunaga, Jongmin Lee, Jaeseok Yoon, Stefanos Leonardos, Pieter Abbeel, Kee-Eung Kim
To this end, we introduce AlberDICE, an offline MARL algorithm that alternatively performs centralized training of individual agents based on stationary distribution optimization.
no code implementations • 2 Nov 2023 • Vint Lee, Pieter Abbeel, Youngwoon Lee
Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • 2 Nov 2023 • Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
Our benchmark results show that many models are vulnerable to the attack strategies in the Tensor Trust dataset.
no code implementations • 2 Nov 2023 • Carmelo Sferrazza, Younggyo Seo, Hao liu, Youngwoon Lee, Pieter Abbeel
For tasks requiring object manipulation, we seamlessly and effectively exploit the complementarity of our senses of vision and touch.
no code implementations • 26 Oct 2023 • Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila Mcilraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann
Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals.
no code implementations • 18 Oct 2023 • Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk
Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.
no code implementations • 16 Oct 2023 • Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson
We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data.
no code implementations • 16 Oct 2023 • Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik
To tackle this, we propose a simple framework that achieves interactive task planning with language models by incorporating both high-level planning and low-level skill execution through function calling, leveraging pretrained vision models to ground the scene in language.
no code implementations • 13 Oct 2023 • Hao liu, Matei Zaharia, Pieter Abbeel
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
Reinforcement Learning (RL)
Unsupervised Reinforcement Learning
no code implementations • 9 Oct 2023 • Sherry Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel
Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.
no code implementations • 4 Oct 2023 • Weirui Ye, Yunsheng Zhang, Haoyang Weng, Xianfan Gu, Shengjie Wang, Tong Zhang, Mengchen Wang, Pieter Abbeel, Yang Gao
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models.
8 code implementations • 3 Oct 2023 • Hao liu, Matei Zaharia, Pieter Abbeel
Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications.
no code implementations • 25 Sep 2023 • Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu
This work aims to improve unsupervised audio-visual pre-training.
1 code implementation • 31 Aug 2023 • Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James
Contact is at the core of robotic manipulation.
1 code implementation • 23 Aug 2023 • Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel
Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years.
1 code implementation • 31 Jul 2023 • Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb
Dense packing in pick-and-place systems is an important feature in many warehouse and logistics applications.
no code implementations • 31 Jul 2023 • Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan
While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language -- language like "this button turns on the TV" or "I put the bowls away" -- that conveys general knowledge, describes the state of the world, provides interactive feedback, and more.
no code implementations • 7 Jul 2023 • Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel
In this work, we present a focused study of the generalization capabilities of the pre-trained visual representations at the categorical level.
1 code implementation • 21 Jun 2023 • Joey Hejna, Pieter Abbeel, Lerrel Pinto
Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents.
no code implementations • 16 Jun 2023 • Xinran Liang, Anthony Han, Wilson Yan, aditi raghunathan, Pieter Abbeel
In addition, we show that by training on actively collected data more relevant to the environment and task, our method generalizes more robustly to downstream tasks compared to models pre-trained on fixed datasets such as ImageNet.
no code implementations • 2 Jun 2023 • Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel
Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.
1 code implementation • 1 Jun 2023 • Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, Abhinav Gupta
Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data.
no code implementations • NeurIPS 2023 • Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo
A promising technique for exploration is to maximize the entropy of visited state distribution, i. e., state entropy, by encouraging uniform coverage of visited state space.
3 code implementations • 30 May 2023 • Hao liu, Pieter Abbeel
Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications.
no code implementations • 26 May 2023 • Hao liu, Pieter Abbeel
Our method consists of relabelling target return of each trajectory to the maximum total reward among in sequence of trajectories and training an autoregressive model to predict actions conditioning on past states, actions, rewards, target returns, and task completion tokens, the resulting model, Agentic Transformer (AT), can learn to improve upon itself both at training and test time.
2 code implementations • 25 May 2023 • Ying Fan, Olivia Watkins, Yuqing Du, Hao liu, MoonKyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee
We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward.
1 code implementation • 25 May 2023 • Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao liu, Pieter Abbeel, Sergey Levine, Dawn Song
This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
no code implementations • 10 May 2023 • Yuxuan Liu, Xi Chen, Pieter Abbeel
Leveraging this insight, we learn a grasp segmentation model to segment the grasped object from before and after grasp images.
1 code implementation • 4 May 2023 • Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran
We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making.
no code implementations • 3 May 2023 • Yuxuan Liu, Nikhil Mishra, Pieter Abbeel, Xi Chen
Existing state-of-the-art methods are often unable to capture meaningful uncertainty in challenging or ambiguous scenes, and as such can cause critical errors in high-performance applications.
1 code implementation • 9 Apr 2023 • Kevin Zakka, Philipp Wu, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel
Replicating human-like dexterity in robot hands represents one of the largest open problems in robotics.
no code implementations • NeurIPS 2023 • Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier
Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average).
no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans
In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
1 code implementation • 2 Mar 2023 • Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee
In this paper, we present Preference Transformer, a neural architecture that models human preferences using transformers.
no code implementations • 23 Feb 2023 • Kimin Lee, Hao liu, MoonKyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu
Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.
no code implementations • 19 Feb 2023 • Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath
This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world.
1 code implementation • 13 Feb 2023 • Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas
Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function.
3 code implementations • 10 Feb 2023 • Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel
One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision.
1 code implementation • 10 Feb 2023 • Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez
In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.
3 code implementations • 6 Feb 2023 • Hao liu, Carmelo Sferrazza, Pieter Abbeel
Applying our method to large language models, we observed that Chain of Hindsight significantly surpasses previous methods in aligning language models with human preferences.
1 code implementation • 5 Feb 2023 • Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel
In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation.
1 code implementation • NeurIPS 2023 • Hao liu, Wilson Yan, Pieter Abbeel
Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of text-based tasks.
1 code implementation • 23 Nov 2022 • Fangchen Liu, Hao liu, Aditya Grover, Pieter Abbeel
We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models.
no code implementations • 23 Nov 2022 • David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum
Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications.
no code implementations • CVPR 2023 • Ajay Jain, Amber Xie, Pieter Abbeel
We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics.
no code implementations • 3 Nov 2022 • Kai Chen, Stephen James, Congying Sui, Yun-hui Liu, Pieter Abbeel, Qi Dou
To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions.
1 code implementation • 25 Oct 2022 • John So, Amber Xie, Sunggoo Jung, Jeffrey Edlund, Rohan Thakker, Ali Agha-mohammadi, Pieter Abbeel, Stephen James
In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data.
1 code implementation • 24 Oct 2022 • Hao liu, Lisa Lee, Kimin Lee, Pieter Abbeel
Our \ours method consists of a multimodal transformer that encodes visual observations and language instructions, and a transformer-based policy that predicts actions based on encoded representations.
no code implementations • 24 Oct 2022 • Hao liu, Xinyang Geng, Lisa Lee, Igor Mordatch, Sergey Levine, Sharan Narang, Pieter Abbeel
Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks.
1 code implementation • 24 Oct 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.
no code implementations • 23 Oct 2022 • Weirui Ye, Pieter Abbeel, Yang Gao
This paper proposes the Virtual MCTS (V-MCTS), a variant of MCTS that spends more search time on harder states and less search time on simpler states adaptively.
1 code implementation • 19 Oct 2022 • Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica
Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks.
1 code implementation • 14 Oct 2022 • Ademi Adeniji, Amber Xie, Pieter Abbeel
While unsupervised skill discovery has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill pretraining and downstream, task-aware finetuning.
no code implementations • 13 Oct 2022 • Yuxuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen
3D bounding boxes are a widespread intermediate representation in many computer vision applications.
1 code implementation • 6 Oct 2022 • Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell
Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.
2 code implementations • 5 Oct 2022 • Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel
To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world.
1 code implementation • 16 Sep 2022 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.
no code implementations • 15 Sep 2022 • Kyle Hollins Wray, Stas Tiomkin, Mykel J. Kochenderfer, Pieter Abbeel
Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety.
no code implementations • 15 Sep 2022 • Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel
Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics.
1 code implementation • 3 Aug 2022 • Qiyang Li, Ajay Jain, Pieter Abbeel
Autoregressive generative models can estimate complex continuous data distributions, like trajectory rollouts in an RL environment, image intensities, and audio.
1 code implementation • 29 Jun 2022 • Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, Ken Goldberg
With continual learning, interventions from the remote pool of humans can also be used to improve the robot fleet control policy over time.
no code implementations • 28 Jun 2022 • Younggyo Seo, Danijar Hafner, Hao liu, Fangchen Liu, Stephen James, Kimin Lee, Pieter Abbeel
Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects.
Model-based Reinforcement Learning
Reinforcement Learning (RL)
+1
1 code implementation • 28 Jun 2022 • Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel
Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment.
1 code implementation • 8 Jun 2022 • Wilson Yan, Ryo Okumura, Stephen James, Pieter Abbeel
In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos.
1 code implementation • 8 Jun 2022 • Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
no code implementations • 7 Jun 2022 • Zhao Mandi, Pieter Abbeel, Stephen James
From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.
3 code implementations • 27 May 2022 • Xinyang Geng, Hao liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel
We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.
2 code implementations • ICLR 2022 • Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel
Our intuition is that disagreement in learned reward model reflects uncertainty in tailored human feedback and could be useful for exploration.
1 code implementation • 22 May 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.
2 code implementations • 16 May 2022 • Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.
1 code implementation • 26 Apr 2022 • Stephen James, Pieter Abbeel
Coarse-to-fine Q-attention enables sample-efficient robot manipulation by discretizing the translation space in a coarse-to-fine manner, where the resolution gradually increases at each layer in the hierarchy.
no code implementations • 14 Apr 2022 • Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-hui Liu, Pieter Abbeel, Qi Dou
To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model.
no code implementations • 7 Apr 2022 • Carl Qi, Pieter Abbeel, Aditya Grover
The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal.
1 code implementation • 4 Apr 2022 • Stephen James, Pieter Abbeel
We propose Learned Path Ranking (LPR), a method that accepts an end-effector goal pose, and learns to rank a set of goal-reaching paths generated from an array of path generating methods, including: path planning, Bezier curve sampling, and a learned policy.
1 code implementation • 29 Mar 2022 • Kourosh Hakhamaneshi, Marcel Nassar, Mariano Phielipp, Pieter Abbeel, Vladimir Stojanović
We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties with up to 10x more sample efficiency compared to a randomly initialized model.
no code implementations • 28 Mar 2022 • Alejandro Escontrela, Xue Bin Peng, Wenhao Yu, Tingnan Zhang, Atil Iscen, Ken Goldberg, Pieter Abbeel
We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions.
2 code implementations • 25 Mar 2022 • Younggyo Seo, Kimin Lee, Stephen James, Pieter Abbeel
Our framework consists of two phases: we pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments.
1 code implementation • NeurIPS 2021 • Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek Gupta
Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention.
no code implementations • ICLR 2022 • Jongjin Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
no code implementations • 22 Feb 2022 • Yuqing Du, Pieter Abbeel, Aditya Grover
Training such agents efficiently requires automatic generation of a goal curriculum.
1 code implementation • 8 Feb 2022 • Stephen James, Pieter Abbeel
We propose a new policy parameterization for representing 3D rotations during reinforcement learning.
1 code implementation • 1 Feb 2022 • Michael Laskin, Hao liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel
We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors.
1 code implementation • 31 Jan 2022 • Denis Yarats, David Brandfonbrener, Hao liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto
In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
1 code implementation • 29 Jan 2022 • Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko
In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.
1 code implementation • 18 Jan 2022 • Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch
However, the plans produced naively by LLMs often cannot map precisely to admissible actions.
no code implementations • 6 Dec 2021 • Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.
4 code implementations • CVPR 2022 • Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole
Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.
no code implementations • NeurIPS 2021 • Charles Packer, Pieter Abbeel, Joseph E. Gonzalez
Meta-reinforcement learning (meta-RL) has proven to be a successful framework for leveraging experience from prior tasks to rapidly learn new related tasks, however, current meta-RL approaches struggle to learn in sparse reward environments.
no code implementations • 28 Nov 2021 • Dailin Hu, Pieter Abbeel, Roy Fox
Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness.
no code implementations • 4 Nov 2021 • Wenlong Huang, Igor Mordatch, Pieter Abbeel, Deepak Pathak
We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects and generalize to new objects with unseen shape or size.
1 code implementation • 4 Nov 2021 • Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel
However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark.
3 code implementations • NeurIPS 2021 • Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal.
Ranked #2 on
Atari Games 100k
on Atari 100k
1 code implementation • 28 Oct 2021 • Michael Laskin, Denis Yarats, Hao liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks.
no code implementations • 28 Oct 2021 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.
no code implementations • 26 Oct 2021 • Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel
We then study the multi-task setting, where multi-task training is followed by (i) one-shot imitation on variations within the training tasks, (ii) one-shot imitation on new tasks, and (iii) fine-tuning on new tasks.
1 code implementation • ICLR 2022 • Yuqing Du, Pieter Abbeel, Aditya Grover
We are interested in training general-purpose reinforcement learning agents that can solve a wide variety of goals.
no code implementations • 29 Sep 2021 • Donald Joseph Hejna III, Pieter Abbeel, Lerrel Pinto
Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents.
no code implementations • 29 Sep 2021 • Aaron L Putterman, Kevin Lu, Igor Mordatch, Pieter Abbeel
We study reinforcement learning (RL) agents which can utilize language inputs.
no code implementations • 29 Sep 2021 • Catherine Cang, Kourosh Hakhamaneshi, Ryan Rudes, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin
In this paper, we investigate how we can leverage large reward-free (i. e. task-agnostic) offline datasets of prior interactions to pre-train agents that can then be fine-tuned using a small reward-annotated dataset.
no code implementations • 29 Sep 2021 • Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel
Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics.
no code implementations • 31 Aug 2021 • Hao liu, Pieter Abbeel
We introduce a new unsupervised pretraining objective for reinforcement learning.
no code implementations • 11 Aug 2021 • Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin
A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations.
no code implementations • 19 Jul 2021 • Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto
In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks.
1 code implementation • ICML Workshop URL 2021 • Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin
To this end, we present Few-shot Imitation with Skill Transition Models (FIST), an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks given a few downstream demonstrations.
no code implementations • 5 Jul 2021 • Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan
Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.
1 code implementation • 1 Jul 2021 • SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin
Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets.
no code implementations • 18 Jun 2021 • Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia
To showcase the benefits, we interfaced SCENIC to an existing RTS environment Google Research Football(GRF) simulator and introduced a benchmark consisting of 32 realistic scenarios, encoded in SCENIC, to train RL agents and testing their generalization capabilities.
no code implementations • 16 Jun 2021 • Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin
When combined together, they substantially improve the performance and generalization of offline RL policies.
1 code implementation • 14 Jun 2021 • Boyuan Chen, Pieter Abbeel, Deepak Pathak
Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control.
no code implementations • ICML Workshop URL 2021 • Michael Laskin, Catherine Cang, Ryan Rudes, Pieter Abbeel
To alleviate the reliance on reward engineering it is important to develop RL algorithms capable of efficiently acquiring skills with no rewards extrinsic to the agent.
2 code implementations • 9 Jun 2021 • Kimin Lee, Laura Smith, Pieter Abbeel
We also show that our method is able to utilize real-time human feedback to effectively prevent reward exploitation and learn new behaviors that are difficult to specify with standard reward functions.
1 code implementation • 2 Jun 2021 • Kourosh Hakhamaneshi, Pieter Abbeel, Vladimir Stojanovic, Aditya Grover
Such a decomposition can dynamically control the reliability of information derived from the online and offline data and the use of pretrained neural networks permits scalability to large offline datasets.
20 code implementations • NeurIPS 2021 • Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.
Ranked #3 on
Offline RL
on D4RL
3 code implementations • 20 Apr 2021 • Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas
We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.
1 code implementation • 15 Apr 2021 • Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak
Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.
1 code implementation • ICLR 2021 • David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan
Since reward functions are hard to specify, recent work has focused on learning policies from human feedback.
no code implementations • 7 Apr 2021 • Philippe Hansen-Estruch, Wenling Shang, Lerrel Pinto, Pieter Abbeel, Stas Tiomkin
In this work, we take advantage of these structures to build effective dynamical models that are amenable to sample-based learning.
4 code implementations • 5 Apr 2021 • Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa
Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips.
2 code implementations • ICCV 2021 • Ajay Jain, Matthew Tancik, Pieter Abbeel
We present DietNeRF, a 3D neural scene representation estimated from a few images.
no code implementations • 26 Mar 2021 • Zhongyu Li, Xuxin Cheng, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath
Developing robust walking controllers for bipedal robots is a challenging endeavor.
2 code implementations • ICLR 2021 • Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu
Reinforcement learning has been shown to be highly successful at many challenging tasks.
4 code implementations • 9 Mar 2021 • Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual blocks.
1 code implementation • NeurIPS 2021 • Hao liu, Pieter Abbeel
We introduce a new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training.
1 code implementation • NeurIPS 2021 • Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel
Recent advances in off-policy deep reinforcement learning (RL) have led to impressive success in complex tasks from visual observations.
Ranked #33 on
Atari Games
on Atari 2600 Amidar
1 code implementation • ICLR 2021 • Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto
Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form.
2 code implementations • ICLR Workshop SSL-RL 2021 • Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee
Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL).
1 code implementation • 13 Feb 2021 • Roshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives
Unsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins.
13 code implementations • CVPR 2021 • Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.
Ranked #54 on
Instance Segmentation
on COCO minival
2 code implementations • NeurIPS 2021 • Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin
Temporal information is essential to learning effective policies with Reinforcement Learning (RL).
no code implementations • 1 Jan 2021 • Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel
In this paper, we present Latent Vector Experience Replay (LeVER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements without sacrificing the performance of RL agents.
no code implementations • 1 Jan 2021 • Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell
By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.
no code implementations • 1 Jan 2021 • Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel
Furthermore, since our weighted Bellman backups rely on maintaining an ensemble, we investigate how weighted Bellman backups interact with other benefits previously derived from ensembles: (a) Bootstrap; (b) UCB Exploration.
no code implementations • 1 Jan 2021 • Thanard Kurutach, Julia Peng, Yang Gao, Stuart Russell, Pieter Abbeel
Discrete representations have been key in enabling robots to plan at more abstract levels and solve temporally-extended tasks more efficiently for decades.
no code implementations • 1 Jan 2021 • SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin
As it turns out, fine-tuning offline RL agents is a non-trivial challenge, due to distribution shift – the agent encounters out-of-distribution samples during online interaction, which may cause bootstrapping error in Q-learning and instability during fine-tuning.
no code implementations • 1 Jan 2021 • Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas
We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.
no code implementations • 1 Jan 2021 • Hao liu, Pieter Abbeel
On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult for training from scratch.
no code implementations • 1 Jan 2021 • Carl Qi, Pieter Abbeel, Aditya Grover
The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal.
no code implementations • 1 Jan 2021 • Mandi Zhao, Qiyang Li, Aravind Srinivas, Ignasi Clavera, Kimin Lee, Pieter Abbeel
Attention mechanisms are generic inductive biases that have played a critical role in improving the state-of-the-art in supervised learning, unsupervised pre-training and generative modeling for multiple domains including vision, language and speech.
no code implementations • 14 Dec 2020 • Albert Zhan, Ruihan Zhao, Lerrel Pinto, Pieter Abbeel, Michael Laskin
We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards.
1 code implementation • ICLR 2021 • Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch
We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills.
1 code implementation • 7 Dec 2020 • Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel
Deep learning models trained on large data sets have been widely successful in both vision and language domains.
1 code implementation • NeurIPS 2020 • Younggyo Seo, Kimin Lee, Ignasi Clavera, Thanard Kurutach, Jinwoo Shin, Pieter Abbeel
Model-based reinforcement learning (RL) has shown great potential in various control tasks in terms of both sample-efficiency and final performance.
1 code implementation • 9 Oct 2020 • Gregory Kahn, Pieter Abbeel, Sergey Levine
However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate.
3 code implementations • 14 Sep 2020 • Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin
In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning.
no code implementations • 11 Aug 2020 • Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
1 code implementation • 4 Aug 2020 • Eugene Vinitsky, Yuqing Du, Kanaad Parvate, Kathy Jang, Pieter Abbeel, Alexandre Bayen
Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed.
Out-of-Distribution Generalization
reinforcement-learning
+2
no code implementations • 3 Aug 2020 • Xingyu Lu, Kimin Lee, Pieter Abbeel, Stas Tiomkin
Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, RL agents often overfit to training environments and struggle to adapt to new, unseen environments.
1 code implementation • 17 Jul 2020 • Hao Liu, Pieter Abbeel
In this paper we show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning.
no code implementations • ICLR 2021 • Ruihan Zhao, Kevin Lu, Pieter Abbeel, Stas Tiomkin
We demonstrate our solution for sample-based unsupervised stabilization on different dynamical control systems and show the advantages of our method by comparing it to the existing VLB approaches.
1 code implementation • ICML 2020 • Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Xi Chen
In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.
1 code implementation • 9 Jul 2020 • Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel
Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains.