Search Results for author: Yunhao Tang

Found 38 papers, 10 papers with code

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

no code implementations9 Feb 2023 Pierre H. Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill

With simple linear algebra, we show that when using a linear predictor, the optimal predictor is close to an orthogonal projection, and propose a general framework based on orthonormalization that enables to interpret and give intuition on why BYOL works.

An Analysis of Quantile Temporal-Difference Learning

no code implementations11 Jan 2023 Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.

Distributional Reinforcement Learning reinforcement-learning +1

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

no code implementations16 May 2022 Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.

Multi-Armed Bandits

Marginalized Operators for Off-policy Reinforcement Learning

no code implementations30 Mar 2022 Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases.

Off-policy evaluation reinforcement-learning

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

no code implementations14 Dec 2021 Yunhao Tang

Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice.

Meta Reinforcement Learning reinforcement-learning +1

Taylor Expansion of Discount Factors

no code implementations11 Jun 2021 Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.

reinforcement-learning Reinforcement Learning (RL)

Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

no code implementations27 Feb 2021 Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Continuous Control reinforcement-learning +1

ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

1 code implementation19 Jan 2021 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters.

Combinatorial Optimization Continuous Control +3

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

no code implementations13 Jun 2020 Yunhao Tang, Alp Kucukelbir

We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective.

reinforcement-learning Reinforcement Learning (RL)

Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies

no code implementations13 Jun 2020 Yunhao Tang, Krzysztof Choromanski

Off-policy learning algorithms have been known to be sensitive to the choice of hyper-parameters.

Continuous Control

Self-Imitation Learning via Generalized Lower Bound Q-learning

no code implementations NeurIPS 2020 Yunhao Tang

Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning.

Continuous Control Imitation Learning +1

Discrete Action On-Policy Learning with Action-Value Critic

1 code implementation10 Feb 2020 Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou

Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently.

OpenAI Gym Reinforcement Learning (RL)

Reinforcement Learning with Chromatic Networks

no code implementations25 Sep 2019 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.

Neural Architecture Search reinforcement-learning +1

Behavior-Guided Reinforcement Learning

no code implementations25 Sep 2019 Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

reinforcement-learning Reinforcement Learning (RL)

ES-MAML: Simple Hessian-Free Meta Learning

1 code implementation ICLR 2020 Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang

We introduce ES-MAML, a new framework for solving the model agnostic meta learning (MAML) problem based on Evolution Strategies (ES).

Meta-Learning

Reinforcement Learning with Chromatic Networks for Compact Architecture Search

no code implementations10 Jul 2019 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.

Combinatorial Optimization Neural Architecture Search +2

Learning to Score Behaviors for Guided Policy Optimization

1 code implementation ICML 2020 Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael. I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

Efficient Exploration Imitation Learning +2

Variance Reduction for Evolution Strategies via Structured Control Variates

no code implementations29 May 2019 Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir

Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL).

Reinforcement Learning (RL)

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes

no code implementations29 May 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes.

Point Processes

Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

no code implementations13 Mar 2019 Yunhao Tang, Mingzhang Yin, Mingyuan Zhou

Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency.

Provably Robust Blackbox Optimization for Reinforcement Learning

no code implementations7 Mar 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani

Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics.

reinforcement-learning Reinforcement Learning (RL) +1

From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization

1 code implementation NeurIPS 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly.

Multi-Armed Bandits

Discretizing Continuous Action Space for On-Policy Optimization

2 code implementations29 Jan 2019 Yunhao Tang, Shipra Agrawal

In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization.

Continuous Control Inductive Bias

Boosting Trust Region Policy Optimization by Normalizing Flows Policy

1 code implementation27 Sep 2018 Yunhao Tang, Shipra Agrawal

We propose to improve trust region policy search with normalizing flows policy.

Implicit Policy for Reinforcement Learning

no code implementations10 Jun 2018 Yunhao Tang, Shipra Agrawal

We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients.

reinforcement-learning Reinforcement Learning (RL)

Exploration by Distributional Reinforcement Learning

no code implementations4 May 2018 Yunhao Tang, Shipra Agrawal

We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.

Distributional Reinforcement Learning Efficient Exploration +2

Variational Deep Q Network

1 code implementation30 Nov 2017 Yunhao Tang, Alp Kucukelbir

We propose a framework that directly tackles the probability distribution of the value function parameters in Deep Q Network (DQN), with powerful variational inference subroutines to approximate the posterior of the parameters.

Efficient Exploration Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.