1 code implementation • ICML 2020 • Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama
Learning from demonstrations can be challenging when the quality of demonstrations is diverse, and even more so when the quality is unknown and there is no additional information to estimate the quality.
1 code implementation • 12 Mar 2021 • Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies.
1 code implementation • 20 Oct 2020 • Voot Tangkaratt, Nontawat Charoenphakdee, Masashi Sugiyama
Robust learning from noisy demonstrations is a practical but highly challenging problem in imitation learning.
no code implementations • 4 Jun 2020 • Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka
Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings.
no code implementations • 15 Sep 2019 • Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama
However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs.
no code implementations • 27 Jan 2019 • Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama
Imitation learning (IL) aims to learn an optimal policy from demonstrations.
1 code implementation • ICLR 2019 • Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task.
1 code implementation • 19 Dec 2018 • Simone Parisi, Voot Tangkaratt, Jan Peters, Mohammad Emtiyaz Khan
Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability.
no code implementations • 6 Dec 2018 • Si-An Chen, Voot Tangkaratt, Hsuan-Tien Lin, Masashi Sugiyama
In this work, we propose Active Reinforcement Learning with Demonstration (ARLD), a new framework to streamline RL in terms of demonstration efforts by allowing the RL agent to query for demonstration actively during training.
no code implementations • 27 Sep 2018 • Voot Tangkaratt, Masashi Sugiyama
Imitation learning aims to learn an optimal policy from expert demonstrations and its recent combination with deep learning has shown impressive performance.
3 code implementations • ICML 2018 • Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava
Uncertainty computation in deep learning is essential to design robust and reliable systems.
no code implementations • 4 Dec 2017 • Mohammad Emtiyaz Khan, Zuozhu Liu, Voot Tangkaratt, Yarin Gal
Overall, this paper presents Vprop as a principled, computationally-efficient, and easy-to-implement method for Bayesian deep learning.
no code implementations • 15 Nov 2017 • Mohammad Emtiyaz Khan, Wu Lin, Voot Tangkaratt, Zuozhu Liu, Didrik Nielsen
We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning.
1 code implementation • ICLR 2018 • Voot Tangkaratt, Abbas Abdolmaleki, Masashi Sugiyama
First, we show that GAC updates the guide actor by performing second-order optimization in the action space where the curvature matrix is based on the Hessians of the critic.
no code implementations • 10 Nov 2016 • Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama
A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored.
no code implementations • 5 Aug 2015 • Voot Tangkaratt, Hiroaki Sasaki, Masashi Sugiyama
On the other hand, quadratic MI (QMI) is a variant of MI based on the $L_2$ distance which is more robust against outliers than the KL divergence, and a computationally efficient method to estimate QMI from data, called least-squares QMI (LSQMI), has been proposed recently.
no code implementations • 28 Apr 2014 • Voot Tangkaratt, Ning Xie, Masashi Sugiyama
In such a case, estimating the conditional density itself is preferable, but conditional density estimation (CDE) is challenging in high-dimensional space.
no code implementations • 19 Jul 2013 • Syogo Mori, Voot Tangkaratt, Tingting Zhao, Jun Morimoto, Masashi Sugiyama
The model-free RL approach directly learns the policy based on data samples.