no code implementations • 22 Nov 2023 • Yinuo Ren, Tesi Xiao, Tanmay Gangwani, Anshuka Rangi, Holakou Rahmanian, Lexing Ying, Subhajit Sanyal
Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications.
no code implementations • 1 Feb 2023 • Sanath Kumar Krishnamurthy, Shrey Modi, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Anshuka Rangi
We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms.
1 code implementation • ICLR 2022 • Tanmay Gangwani, Yuan Zhou, Jian Peng
In this work, we propose an algorithm that trains an intermediary policy in the learner environment and uses it as a surrogate expert for the learner.
1 code implementation • ICLR 2022 • Michael Wan, Jian Peng, Tanmay Gangwani
Meta-reinforcement learning (meta-RL) algorithms allow for agents to learn new behaviors from small amounts of experience, mitigating the sample inefficiency problem in RL.
1 code implementation • 5 Nov 2020 • Tanmay Gangwani, Jian Peng, Yuan Zhou
Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning.
2 code implementations • NeurIPS 2020 • Tanmay Gangwani, Yuan Zhou, Jian Peng
To make credit assignment easier, recent works have proposed algorithms to learn dense "guidance" rewards that could be used in place of the sparse or delayed environmental rewards.
1 code implementation • 12 Jun 2020 • Michael Wan, Tanmay Gangwani, Jian Peng
In this paper, we propose a new framework for transfer learning where the teacher and the student can have arbitrarily different state- and action-spaces.
1 code implementation • ICLR 2020 • Tanmay Gangwani, Jian Peng
Imitation Learning (IL) is a popular paradigm for training agents to achieve complicated goals by leveraging expert behavior, rather than dealing with the hardships of designing a correct reward function.
1 code implementation • 22 Jun 2019 • Tanmay Gangwani, Joel Lehman, Qiang Liu, Jian Peng
We consider the problem of imitation learning from expert demonstrations in partially observable Markov decision processes (POMDPs).
no code implementations • ICLR 2019 • Tanmay Gangwani, Qiang Liu, Jian Peng
Improving the efficiency of RL algorithms in real-world problems with sparse or episodic rewards is therefore a pressing need.
no code implementations • ICLR 2018 • Tanmay Gangwani, Jian Peng
GPO uses imitation learning for policy crossover in the state space and applies policy gradient methods for mutation.