Search Results for author: Wesley Chung

Found 8 papers, 1 papers with code

The Role of Baselines in Policy Gradient Optimization

no code implementations16 Jan 2023 Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance.

Offline-Online Reinforcement Learning: Extending Batch and Online RL

no code implementations29 Sep 2021 Maryam Hashemzadeh, Wesley Chung, Martha White

To enable better performance, we investigate the offline-online setting: The agent has access to a batch of data to train on but is also allowed to learn during the evaluation phase in an online manner.

reinforcement-learning Reinforcement Learning (RL)

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

no code implementations31 Aug 2020 Wesley Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Traditionally, stochastic optimization theory predicts that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates.

Reinforcement Learning (RL) Stochastic Optimization

Incrementally Learning Functions of the Return

no code implementations5 Jul 2019 Brendan Bennett, Wesley Chung, Muhammad Zaheer, Vincent Liu

Temporal difference methods enable efficient estimation of value functions in reinforcement learning in an incremental fashion, and are of broader interest because they correspond learning as observed in biological systems.

Reinforcement Learning (RL)

Importance Resampling for Off-policy Prediction

2 code implementations NeurIPS 2019 Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White

Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning.

Two-Timescale Networks for Nonlinear Value Function Approximation

no code implementations ICLR 2019 Wesley Chung, Somjit Nath, Ajin Joseph, Martha White

A key component for many reinforcement learning agents is to learn a value function, either for policy evaluation or control.

Q-Learning Vocal Bursts Valence Prediction

Importance Resampling for Off-policy Policy Evaluation

no code implementations27 Sep 2018 Matthew Schlegel, Wesley Chung, Daniel Graves, Martha White

We propose Importance Resampling (IR) for off-policy learning, that resamples experience from the replay buffer and applies a standard on-policy update.

High-confidence error estimates for learned value functions

no code implementations28 Aug 2018 Touqir Sajed, Wesley Chung, Martha White

We provide experiments investigating the number of samples required by this offline algorithm in simple benchmark reinforcement learning domains, and highlight that there are still many open questions to be solved for this important problem.

reinforcement-learning Reinforcement Learning (RL) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.