Search Results for author: Archit Sharma

Found 24 papers, 13 papers with code

Stream of Search (SoS): Learning to Search in Language

1 code implementation • 1 Apr 2024 • Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman

In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS).

Language Modelling

Paper
Code

Yell At Your Robot: Improving On-the-Fly from Language Corrections

no code implementations • 19 Mar 2024 • Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

In this paper, we make the following observation: high-level policies that index into sufficiently rich and expressive low-level language-conditioned skills can be readily supervised with human feedback in the form of language corrections.

Paper
Add Code

A Critical Evaluation of AI Feedback for Aligning Large Language Models

1 code implementation • 19 Feb 2024 • Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar

RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher model and then further fine-tunes the model with reinforcement learning (RL), using feedback from a critic model.

Instruction Following reinforcement-learning +1

Paper
Code

RLVF: Learning from Verbal Feedback without Overgeneralization

1 code implementation • 16 Feb 2024 • Moritz Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen, Sheryl Hsu, Archit Sharma, Chelsea Finn

The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences.

Paper
Code

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

no code implementations • 29 Jan 2024 • Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine

We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

no code implementations • 2 Nov 2023 • Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet.

Paper
Add Code

Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

no code implementations • 23 Oct 2023 • Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn

We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Internet.

reinforcement-learning Robot Manipulation

Paper
Add Code

An Emulator for Fine-Tuning Large Language Models using Small Language Models

1 code implementation • 19 Oct 2023 • Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning

To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)?"

Instruction Following

Paper
Code

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

1 code implementation • 12 Oct 2023 • Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn

Can we leverage offline RL to recover better policies from online interaction?

D4RL Offline RL +2

Paper
Code

Waypoint-Based Imitation Learning for Robotic Manipulation

no code implementations • 26 Jul 2023 • Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn

AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10.

Imitation Learning

Paper
Add Code

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

12 code implementations • NeurIPS 2023 • Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn

Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF).

Language Modelling reinforcement-learning +1

7,999

Paper
Code

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

no code implementations • 24 May 2023 • Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning

A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions.

TriviaQA Unsupervised Pre-training

Paper
Add Code

Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning

no code implementations • 2 Mar 2023 • Archit Sharma, Ahmed M. Ahmed, Rehaan Ahmad, Chelsea Finn

In this work, we propose MEDAL++, a novel design for self-improving robotic systems: given a small set of expert demonstrations at the start, the robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning

1 code implementation • 19 Oct 2022 • Annie Xie, Fahim Tajwar, Archit Sharma, Chelsea Finn

A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world.

Continuous Control reinforcement-learning +1

Paper
Code

You Only Live Once: Single-Life Reinforcement Learning

no code implementations • 17 Oct 2022 • Annie S. Chen, Archit Sharma, Sergey Levine, Chelsea Finn

We formalize this problem setting, which we call single-life reinforcement learning (SLRL), where an agent must complete a task within a single episode without interventions, utilizing its prior experience while contending with some form of novelty.

Continuous Control reinforcement-learning +1

Paper
Add Code

A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning

1 code implementation • 11 May 2022 • Archit Sharma, Rehaan Ahmad, Chelsea Finn

Prior works have considered an alternating approach where a forward policy learns to solve the task and the backward policy learns to reset the environment, but what initial state distribution should the backward policy reset the agent to?

Continuous Control reinforcement-learning +1

Paper
Code

Autonomous Reinforcement Learning: Formalism and Benchmarking

2 code implementations • ICLR 2022 • Archit Sharma, Kelvin Xu, Nikhil Sardana, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn

In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials.

Benchmarking reinforcement-learning +1

Paper
Code

Autonomous Reinforcement Learning via Subgoal Curricula

no code implementations • NeurIPS 2021 • Archit Sharma, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn

Reinforcement learning (RL) promises to enable autonomous acquisition of complex behaviors for diverse agents.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

no code implementations • 2 Jun 2021 • Jongwook Choi, Archit Sharma, Honglak Lee, Sergey Levine, Shixiang Shane Gu

Learning to reach goal states and learning diverse skills through mutual information (MI) maximization have been proposed as principled frameworks for self-supervised reinforcement learning, allowing agents to acquire broadly applicable multitask policies with minimal reward engineering.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Discriminator Augmented Model-Based Reinforcement Learning

no code implementations • 24 Mar 2021 • Behzad Haghgoo, Allan Zhou, Archit Sharma, Chelsea Finn

By planning through a learned dynamics model, model-based reinforcement learning (MBRL) offers the prospect of good performance with little environment interaction.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Dynamics-Aware Unsupervised Skill Discovery

1 code implementation • ICLR 2020 • Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.

Model-based Reinforcement Learning

181

Paper
Code

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

2 code implementations • 27 Apr 2020 • Archit Sharma, Michael Ahn, Sergey Levine, Vikash Kumar, Karol Hausman, Shixiang Gu

Can we instead develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks?

Model Predictive Control reinforcement-learning +2