Search Results for author: Sharath Chandra Raparthy

Found 9 papers, 6 papers with code

Teaching Large Language Models to Reason with Reinforcement Learning

no code implementations7 Mar 2024 Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Surprisingly, we find the sample complexity of Expert Iteration is similar to that of PPO, requiring at most on the order of $10^6$ samples to converge from a pretrained checkpoint.

reinforcement-learning

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

no code implementations26 Feb 2024 Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.

Question Answering

Generalization to New Sequential Decision Making Tasks with In-Context Learning

1 code implementation6 Dec 2023 Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.

Decision Making In-Context Learning

Compositional Attention: Disentangling Search and Retrieval

3 code implementations ICLR 2022 Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed.

Retrieval

Curriculum in Gradient-Based Meta-Reinforcement Learning

no code implementations19 Feb 2020 Bhairav Mehta, Tristan Deleu, Sharath Chandra Raparthy, Chris J. Pal, Liam Paull

However, specifically in the case of meta-reinforcement learning (meta-RL), we can show that gradient-based meta-learners are sensitive to task distributions.

Benchmarking Meta-Learning +4

Generating Automatic Curricula via Self-Supervised Active Domain Randomization

1 code implementation18 Feb 2020 Sharath Chandra Raparthy, Bhairav Mehta, Florian Golemo, Liam Paull

Goal-directed Reinforcement Learning (RL) traditionally considers an agent interacting with an environment, prescribing a real-valued reward to an agent proportional to the completion of some goal.

Reinforcement Learning (RL)

Data Efficient Stagewise Knowledge Distillation

1 code implementation15 Nov 2019 Akshay Kulkarni, Navid Panchi, Sharath Chandra Raparthy, Shital Chiddarwar

We show, across the tested tasks, significant performance gains even with a fraction of the data used in distillation, without compromising on the metric.

Knowledge Distillation Model Compression +2

Cannot find the paper you are looking for? You can Submit a new open access paper.