1 code implementation • 21 Dec 2021 • Zhenyang Shi, Surya P. N. Singh
For the purpose of reducing the computational complexity, we also introduce a decoupled policy structure that decouples the Gaussian policy into one policy that learns the mean and one other policy that learns the deviation such that only the mean policy is trained by CEM.
no code implementations • 3 Jun 2021 • Aaron J. Snoswell, Surya P. N. Singh, Nan Ye
Multiple-Intent Inverse Reinforcement Learning (MI-IRL) seeks to find a reward function ensemble to rationalize demonstrations of different but unlabelled intents.
1 code implementation • 1 Dec 2020 • Aaron J. Snoswell, Surya P. N. Singh, Nan Ye
This improves the previous heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free learning algorithm for the MaxEnt IRL model.