no code implementations • 14 Mar 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard
The second type of entropy we study is the trajectory entropy.
no code implementations • 9 Feb 2023 • Pierre H. Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill
With simple linear algebra, we show that when using a linear predictor, the optimal predictor is close to an orthogonal projection, and propose a general framework based on orthonormalization that enables to interpret and give intuition on why BYOL works.
no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.
Distributional Reinforcement Learning
reinforcement-learning
+1
no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.
no code implementations • 15 Jul 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare
We study the multi-step off-policy learning approach to distributional RL.
Distributional Reinforcement Learning
reinforcement-learning
+1
no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.
no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.
no code implementations • 16 May 2022 • Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard
We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.
no code implementations • 30 Mar 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko
We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases.
no code implementations • 14 Dec 2021 • Yunhao Tang
Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice.
1 code implementation • NeurIPS 2021 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko
Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions.
no code implementations • 11 Jun 2021 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.
no code implementations • 27 Feb 2021 • Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel
These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.
no code implementations • 8 Feb 2021 • Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller
There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments.
1 code implementation • 19 Jan 2021 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang
In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters.
3 code implementations • ICML 2020 • Jean-bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos
The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence.
no code implementations • 13 Jun 2020 • Yunhao Tang, Alp Kucukelbir
We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective.
no code implementations • 13 Jun 2020 • Yunhao Tang, Krzysztof Choromanski
Off-policy learning algorithms have been known to be sensitive to the choice of hyper-parameters.
no code implementations • NeurIPS 2020 • Yunhao Tang
Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning.
no code implementations • ICML 2020 • Yunhao Tang, Michal Valko, Rémi Munos
In this work, we investigate the application of Taylor expansions in reinforcement learning.
1 code implementation • 10 Feb 2020 • Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently.
no code implementations • 25 Sep 2019 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang
We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.
no code implementations • 25 Sep 2019 • Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan
We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.
1 code implementation • ICLR 2020 • Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang
We introduce ES-MAML, a new framework for solving the model agnostic meta learning (MAML) problem based on Evolution Strategies (ES).
no code implementations • 10 Jul 2019 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang
We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.
1 code implementation • ICML 2020 • Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael. I. Jordan
We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.
no code implementations • ICML 2020 • Yunhao Tang, Shipra Agrawal, Yuri Faenza
In particular, we investigate a specific methodology for solving IPs, known as the Cutting Plane Method.
no code implementations • 29 May 2019 • Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir
Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL).
no code implementations • 29 May 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang
We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes.
no code implementations • 13 Mar 2019 • Yunhao Tang, Mingzhang Yin, Mingyuan Zhou
Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency.
no code implementations • 9 Mar 2019 • Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller
Wasserstein distances are increasingly used in a wide variety of applications in machine learning.
no code implementations • 7 Mar 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani
Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics.
1 code implementation • NeurIPS 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang
ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly.
2 code implementations • 29 Jan 2019 • Yunhao Tang, Shipra Agrawal
In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization.
1 code implementation • 27 Sep 2018 • Yunhao Tang, Shipra Agrawal
We propose to improve trust region policy search with normalizing flows policy.
no code implementations • 10 Jun 2018 • Yunhao Tang, Shipra Agrawal
We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients.
no code implementations • 4 May 2018 • Yunhao Tang, Shipra Agrawal
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.
Distributional Reinforcement Learning
Efficient Exploration
+2
1 code implementation • 30 Nov 2017 • Yunhao Tang, Alp Kucukelbir
We propose a framework that directly tackles the probability distribution of the value function parameters in Deep Q Network (DQN), with powerful variational inference subroutines to approximate the posterior of the parameters.