Search Results for author: Supratik Paul

Found 7 papers, 2 papers with code

Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula

no code implementations2 Dec 2022 Eli Bronstein, Sirish Srinivasan, Supratik Paul, Aman Sinha, Matthew O'Kelly, Payam Nikdel, Shimon Whiteson

However, this approach produces agents that do not perform robustly in safety-critical settings, an issue that cannot be addressed by simply adding more data to the training set - we show that an agent trained using only a 10% subset of the data performs just as well as an agent trained on the entire dataset.

Autonomous Driving Imitation Learning +1

Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

1 code implementation NeurIPS 2019 Supratik Paul, Vitaly Kurin, Shimon Whiteson

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

Policy Gradient Methods

Fast Efficient Hyperparameter Tuning for Policy Gradients

1 code implementation18 Feb 2019 Supratik Paul, Vitaly Kurin, Shimon Whiteson

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

Meta-Learning Policy Gradient Methods

Learning from Demonstration in the Wild

no code implementations8 Nov 2018 Feryal Behbahani, Kyriacos Shiarlis, Xi Chen, Vitaly Kurin, Sudhanshu Kasewa, Ciprian Stirbu, João Gomes, Supratik Paul, Frans A. Oliehoek, João Messias, Shimon Whiteson

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical.

Fingerprint Policy Optimisation for Robust Reinforcement Learning

no code implementations27 May 2018 Supratik Paul, Michael A. Osborne, Shimon Whiteson

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator.

Bayesian Optimisation Continuous Control +3

Alternating Optimisation and Quadrature for Robust Control

no code implementations24 May 2016 Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy.

Bayesian Optimisation

Cannot find the paper you are looking for? You can Submit a new open access paper.