Search Results for author: Archit Sharma

Found 24 papers, 13 papers with code

Stream of Search (SoS): Learning to Search in Language

1 code implementation1 Apr 2024 Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman

In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS).

Language Modelling

Yell At Your Robot: Improving On-the-Fly from Language Corrections

no code implementations19 Mar 2024 Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

In this paper, we make the following observation: high-level policies that index into sufficiently rich and expressive low-level language-conditioned skills can be readily supervised with human feedback in the form of language corrections.

A Critical Evaluation of AI Feedback for Aligning Large Language Models

1 code implementation19 Feb 2024 Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar

RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher model and then further fine-tunes the model with reinforcement learning (RL), using feedback from a critic model.

Instruction Following reinforcement-learning +1

RLVF: Learning from Verbal Feedback without Overgeneralization

1 code implementation16 Feb 2024 Moritz Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen, Sheryl Hsu, Archit Sharma, Chelsea Finn

The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences.

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

no code implementations29 Jan 2024 Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine

We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods.

reinforcement-learning Reinforcement Learning (RL)

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

no code implementations2 Nov 2023 Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet.

Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

no code implementations23 Oct 2023 Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn

We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Internet.

reinforcement-learning Robot Manipulation

An Emulator for Fine-Tuning Large Language Models using Small Language Models

1 code implementation19 Oct 2023 Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning

To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)?"

Instruction Following

Waypoint-Based Imitation Learning for Robotic Manipulation

no code implementations26 Jul 2023 Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn

AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10.

Imitation Learning

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

12 code implementations NeurIPS 2023 Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn

Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF).

Language Modelling reinforcement-learning +1

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

no code implementations24 May 2023 Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning

A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions.

TriviaQA Unsupervised Pre-training

Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning

no code implementations2 Mar 2023 Archit Sharma, Ahmed M. Ahmed, Rehaan Ahmad, Chelsea Finn

In this work, we propose MEDAL++, a novel design for self-improving robotic systems: given a small set of expert demonstrations at the start, the robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.

reinforcement-learning Reinforcement Learning (RL)

When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning

1 code implementation19 Oct 2022 Annie Xie, Fahim Tajwar, Archit Sharma, Chelsea Finn

A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world.

Continuous Control reinforcement-learning +1

You Only Live Once: Single-Life Reinforcement Learning

no code implementations17 Oct 2022 Annie S. Chen, Archit Sharma, Sergey Levine, Chelsea Finn

We formalize this problem setting, which we call single-life reinforcement learning (SLRL), where an agent must complete a task within a single episode without interventions, utilizing its prior experience while contending with some form of novelty.

Continuous Control reinforcement-learning +1

A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning

1 code implementation11 May 2022 Archit Sharma, Rehaan Ahmad, Chelsea Finn

Prior works have considered an alternating approach where a forward policy learns to solve the task and the backward policy learns to reset the environment, but what initial state distribution should the backward policy reset the agent to?

Continuous Control reinforcement-learning +1

Autonomous Reinforcement Learning: Formalism and Benchmarking

2 code implementations ICLR 2022 Archit Sharma, Kelvin Xu, Nikhil Sardana, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn

In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials.

Benchmarking reinforcement-learning +1

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

no code implementations2 Jun 2021 Jongwook Choi, Archit Sharma, Honglak Lee, Sergey Levine, Shixiang Shane Gu

Learning to reach goal states and learning diverse skills through mutual information (MI) maximization have been proposed as principled frameworks for self-supervised reinforcement learning, allowing agents to acquire broadly applicable multitask policies with minimal reward engineering.

reinforcement-learning Reinforcement Learning (RL) +1

Discriminator Augmented Model-Based Reinforcement Learning

no code implementations24 Mar 2021 Behzad Haghgoo, Allan Zhou, Archit Sharma, Chelsea Finn

By planning through a learned dynamics model, model-based reinforcement learning (MBRL) offers the prospect of good performance with little environment interaction.

Model-based Reinforcement Learning reinforcement-learning +1

Dynamics-Aware Unsupervised Skill Discovery

1 code implementation ICLR 2020 Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.

Model-based Reinforcement Learning

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

2 code implementations27 Apr 2020 Archit Sharma, Michael Ahn, Sergey Levine, Vikash Kumar, Karol Hausman, Shixiang Gu

Can we instead develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks?

Model Predictive Control reinforcement-learning +2

Dynamics-Aware Unsupervised Discovery of Skills

3 code implementations2 Jul 2019 Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.

Model-based Reinforcement Learning

TrueChain: Highly Performant Decentralized Public Ledger

1 code implementation3 May 2018 Archit Sharma, Jasper L, Eric Zhang

In this paper we present the initial design of truechain consensus protocol and other technical details.

Distributed, Parallel, and Cluster Computing

Cannot find the paper you are looking for? You can Submit a new open access paper.