1 code implementation • 1 Apr 2024 • Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman
In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS).
no code implementations • 19 Mar 2024 • Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn
In this paper, we make the following observation: high-level policies that index into sufficiently rich and expressive low-level language-conditioned skills can be readily supervised with human feedback in the form of language corrections.
1 code implementation • 19 Feb 2024 • Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar
RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher model and then further fine-tunes the model with reinforcement learning (RL), using feedback from a critic model.
1 code implementation • 16 Feb 2024 • Moritz Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen, Sheryl Hsu, Archit Sharma, Chelsea Finn
The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences.
no code implementations • 29 Jan 2024 • Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine
We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods.
no code implementations • 2 Nov 2023 • Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn
We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet.
no code implementations • 23 Oct 2023 • Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn
We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Internet.
1 code implementation • 19 Oct 2023 • Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning
To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)?"
1 code implementation • 12 Oct 2023 • Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn
Can we leverage offline RL to recover better policies from online interaction?
no code implementations • 26 Jul 2023 • Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn
AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10.
12 code implementations • NeurIPS 2023 • Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn
Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF).
no code implementations • 24 May 2023 • Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning
A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions.
no code implementations • 2 Mar 2023 • Archit Sharma, Ahmed M. Ahmed, Rehaan Ahmad, Chelsea Finn
In this work, we propose MEDAL++, a novel design for self-improving robotic systems: given a small set of expert demonstrations at the start, the robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
1 code implementation • 19 Oct 2022 • Annie Xie, Fahim Tajwar, Archit Sharma, Chelsea Finn
A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world.
no code implementations • 17 Oct 2022 • Annie S. Chen, Archit Sharma, Sergey Levine, Chelsea Finn
We formalize this problem setting, which we call single-life reinforcement learning (SLRL), where an agent must complete a task within a single episode without interventions, utilizing its prior experience while contending with some form of novelty.
1 code implementation • 11 May 2022 • Archit Sharma, Rehaan Ahmad, Chelsea Finn
Prior works have considered an alternating approach where a forward policy learns to solve the task and the backward policy learns to reset the environment, but what initial state distribution should the backward policy reset the agent to?
2 code implementations • ICLR 2022 • Archit Sharma, Kelvin Xu, Nikhil Sardana, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn
In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials.
no code implementations • NeurIPS 2021 • Archit Sharma, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
Reinforcement learning (RL) promises to enable autonomous acquisition of complex behaviors for diverse agents.
no code implementations • 2 Jun 2021 • Jongwook Choi, Archit Sharma, Honglak Lee, Sergey Levine, Shixiang Shane Gu
Learning to reach goal states and learning diverse skills through mutual information (MI) maximization have been proposed as principled frameworks for self-supervised reinforcement learning, allowing agents to acquire broadly applicable multitask policies with minimal reward engineering.
no code implementations • 24 Mar 2021 • Behzad Haghgoo, Allan Zhou, Archit Sharma, Chelsea Finn
By planning through a learned dynamics model, model-based reinforcement learning (MBRL) offers the prospect of good performance with little environment interaction.
Model-based Reinforcement Learning reinforcement-learning +1
1 code implementation • ICLR 2020 • Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman
Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.
2 code implementations • 27 Apr 2020 • Archit Sharma, Michael Ahn, Sergey Levine, Vikash Kumar, Karol Hausman, Shixiang Gu
Can we instead develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks?
3 code implementations • 2 Jul 2019 • Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman
Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.
1 code implementation • 3 May 2018 • Archit Sharma, Jasper L, Eric Zhang
In this paper we present the initial design of truechain consensus protocol and other technical details.
Distributed, Parallel, and Cluster Computing