Search Results for author: Sharath Chandra Raparthy

Found 9 papers, 6 papers with code

Teaching Large Language Models to Reason with Reinforcement Learning

no code implementations • 7 Mar 2024 • Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Surprisingly, we find the sample complexity of Expert Iteration is similar to that of PPO, requiring at most on the order of $10^6$ samples to converge from a pretrained checkpoint.

reinforcement-learning

Paper
Add Code

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

no code implementations • 26 Feb 2024 • Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.

Question Answering

Paper
Add Code

Generalization to New Sequential Decision Making Tasks with In-Context Learning

1 code implementation • 6 Dec 2023 • Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.

Decision Making In-Context Learning

449

Paper
Code

Multi-Objective GFlowNets

1 code implementation • 23 Oct 2022 • Moksh Jain, Sharath Chandra Raparthy, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Yoshua Bengio, Santiago Miret, Emmanuel Bengio

We study the problem of generating diverse candidates in the context of Multi-Objective Optimization.

Active Learning Drug Discovery

Paper
Code

Continual Learning In Environments With Polynomial Mixing Times

1 code implementation • 13 Dec 2021 • Matthew Riemer, Sharath Chandra Raparthy, Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel, Irina Rish

The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios.

Atari Games Continual Learning +1

Paper
Code

Compositional Attention: Disentangling Search and Retrieval

3 code implementations • ICLR 2022 • Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed.

Retrieval

7,552

Paper
Code

Curriculum in Gradient-Based Meta-Reinforcement Learning

no code implementations • 19 Feb 2020 • Bhairav Mehta, Tristan Deleu, Sharath Chandra Raparthy, Chris J. Pal, Liam Paull

However, specifically in the case of meta-reinforcement learning (meta-RL), we can show that gradient-based meta-learners are sensitive to task distributions.

Benchmarking Meta-Learning +4

Paper
Add Code

Generating Automatic Curricula via Self-Supervised Active Domain Randomization

1 code implementation • 18 Feb 2020 • Sharath Chandra Raparthy, Bhairav Mehta, Florian Golemo, Liam Paull

Goal-directed Reinforcement Learning (RL) traditionally considers an agent interacting with an environment, prescribing a real-valued reward to an agent proportional to the completion of some goal.

Reinforcement Learning (RL)

Paper
Code

Data Efficient Stagewise Knowledge Distillation

1 code implementation • 15 Nov 2019 • Akshay Kulkarni, Navid Panchi, Sharath Chandra Raparthy, Shital Chiddarwar

We show, across the tested tasks, significant performance gains even with a fraction of the data used in distillation, without compromising on the metric.

Knowledge Distillation Model Compression +2

114

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.