Search Results for author: Shangtong Zhang

Found 36 papers, 19 papers with code

A Survey of In-Context Reinforcement Learning

no code implementations11 Feb 2025 Amir Moeini, Jiuqi Wang, Jacob Beck, Ethan Blaser, Shimon Whiteson, Rohan Chandra, Shangtong Zhang

Reinforcement learning (RL) agents typically optimize their policies by performing expensive backward passes to update their network parameters.

reinforcement-learning Reinforcement Learning +2

Linear $Q$-Learning Does Not Diverge: Convergence Rates to a Bounded Set

no code implementations31 Jan 2025 Xinyu Liu, Zixuan Xie, Shangtong Zhang

As a side product, we also use this general result to establish the $L^2$ convergence rate of tabular $Q$-learning with an $\epsilon$-softmax behavior policy, for which we rely on a novel pseudo-contraction property of the weighted Bellman optimality operator.

Q-Learning

CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening

no code implementations26 Nov 2024 Amar Kulkarni, Shangtong Zhang, Madhur Behl

First CRASH can control adversarial Non Player Character (NPC) agents in an AV simulator to automatically induce collisions with the Ego vehicle, falsifying its motion planner.

Autonomous Vehicles Deep Reinforcement Learning +1

Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise

no code implementations20 Nov 2024 Xiaochi Qian, Zixuan Xie, Xinyu Liu, Shangtong Zhang

As applications, we provide the first almost sure convergence rate for $Q$-learning with Markovian samples without count-based learning rates.

Q-Learning

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

no code implementations8 Oct 2024 Claire Chen, Shuze Liu, Shangtong Zhang

In reinforcement learning, classic on-policy evaluation methods often suffer from high variance and require massive online data to attain the desired accuracy.

reinforcement-learning Reinforcement Learning

Doubly Optimal Policy Evaluation for Reinforcement Learning

no code implementations3 Oct 2024 Shuze Liu, Claire Chen, Shangtong Zhang

Policy evaluation estimates the performance of a policy by (1) collecting data from the environment and (2) processing raw data into a meaningful estimate.

reinforcement-learning Reinforcement Learning

Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise

no code implementations29 Sep 2024 Ethan Blaser, Shangtong Zhang

Stochastic approximation is an important class of algorithms, and a large body of previous analysis focuses on stochastic approximations driven by contractive operators, which is not applicable in some important reinforcement learning settings.

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

no code implementations18 Sep 2024 Jiuqi Wang, Shangtong Zhang

This work is the first to establish the almost sure convergence of linear TD without requiring linearly independent features.

Efficient Multi-Policy Evaluation for Reinforcement Learning

no code implementations16 Aug 2024 Shuze Daniel Liu, Claire Chen, Shangtong Zhang

To unbiasedly evaluate multiple target policies, the dominant approach among RL practitioners is to run and evaluate each target policy separately.

reinforcement-learning Reinforcement Learning

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

no code implementations15 Jan 2024 Shuze Liu, Shuhang Chen, Shangtong Zhang

Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e. g., stochastic gradient descent and temporal difference learning.

reinforcement-learning Reinforcement Learning

Revisiting a Design Choice in Gradient Temporal Difference Learning

no code implementations2 Aug 2023 Xiaochi Qian, Shangtong Zhang

In this paper, we revisit this $A^\top$TD and prove that a variant of $A^\top$TD, called $A_t^\top$TD, is also an effective solution to the deadly triad.

reinforcement-learning Reinforcement Learning +1

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

1 code implementation31 Jan 2023 Shuze Liu, Shangtong Zhang

Most reinforcement learning practitioners evaluate their policies with online Monte Carlo estimators for either hyperparameter tuning or testing different algorithmic design choices, where the policy is repeatedly executed in the environment to get the average outcome.

Management

On the Convergence of SARSA with Linear Function Approximation

no code implementations14 Feb 2022 Shangtong Zhang, Remi Tachet, Romain Laroche

SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region.

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

1 code implementation NeurIPS 2023 Shangtong Zhang, Remi Tachet, Romain Laroche

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy.

Policy Gradient Methods

Truncated Emphatic Temporal Difference Methods for Prediction and Control

1 code implementation11 Aug 2021 Shangtong Zhang, Shimon Whiteson

Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL, there are still two open problems.

Prediction Reinforcement Learning (RL)

Learning Expected Emphatic Traces for Deep RL

no code implementations12 Jul 2021 Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt

We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.

Breaking the Deadly Triad with a Target Network

1 code implementation21 Jan 2021 Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.

Q-Learning

Average-Reward Off-Policy Policy Evaluation with Function Approximation

1 code implementation8 Jan 2021 Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function.

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

1 code implementation2 Oct 2020 Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Representation Learning

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

1 code implementation22 Apr 2020 Shangtong Zhang, Bo Liu, Shimon Whiteson

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable.

MuJoCo reinforcement-learning +2

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

1 code implementation ICML 2020 Shangtong Zhang, Bo Liu, Shimon Whiteson

Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so any primal-dual algorithm is not guaranteed to converge or find the desired solution.

Reinforcement Learning

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

1 code implementation ICML 2020 Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

Vocal Bursts Valence Prediction

Distributional Reinforcement Learning for Efficient Exploration

no code implementations13 May 2019 Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yao-Liang Yu

In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties.

Atari Games Distributional Reinforcement Learning +4

Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

1 code implementation12 May 2019 Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Shangtong Zhang, Andrzej Wojcicki, Mai Xu

Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i. e., playing games without extrinsic rewards but evaluated with extrinsic rewards.

Deep Residual Reinforcement Learning

1 code implementation3 May 2019 Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We revisit residual algorithms in both model-free and model-based reinforcement learning settings.

Model-based Reinforcement Learning reinforcement-learning +2

Generalized Off-Policy Actor-Critic

1 code implementation NeurIPS 2019 Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting.

counterfactual MuJoCo +3

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search

1 code implementation6 Nov 2018 Shangtong Zhang, Hao Chen, Hengshuai Yao

In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning.

continuous-control Continuous Control +4

QUOTA: The Quantile Option Architecture for Reinforcement Learning

3 code implementations5 Nov 2018 Shangtong Zhang, Borislav Mavrin, Linglong Kong, Bo Liu, Hengshuai Yao

In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL).

Decision Making Distributional Reinforcement Learning +3

mlpack 3: a fast, flexible machine learning library

1 code implementation Journal of Open Source Software 2018 Ryan R. Curtin, Marcus Edel, Mikhail Lozhnikov, Yannis Mentekidis, Sumedh Ghaisas, Shangtong Zhang

In the past several years, the field of machine learning has seen an explosion of interest and excitement, with hundreds or thousands of algorithms developed for different tasks every year.

Benchmarking BIG-bench Machine Learning +2

A Deeper Look at Experience Replay

4 code implementations4 Dec 2017 Shangtong Zhang, Richard S. Sutton

Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay.

Atari Games Deep Reinforcement Learning +2

Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control

no code implementations30 Nov 2017 Shangtong Zhang, Osmar R. Zaiane

Reinforcement Learning and the Evolutionary Strategy are two major approaches in addressing complicated control problems.

continuous-control Continuous Control +3

Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

no code implementations9 Dec 2016 Vivek Veeriah, Shangtong Zhang, Richard S. Sutton

In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes.

Incremental Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.