no code implementations • 15 Apr 2025 • Yudong Luo, Yangchen Pan, Jiaqi Tan, Pascal Poupart
Risk-averse reinforcement learning (RARL) is critical for decision-making under uncertainty, which is especially valuable in high-stake applications.
no code implementations • 4 Feb 2025 • Avery Ma, Yangchen Pan, Amir-Massoud Farahmand
Many-shot jailbreaking circumvents the safety alignment of large language models by exploiting their ability to process long input sequences.
1 code implementation • 28 May 2024 • Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu
In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges.
1 code implementation • 28 May 2024 • Zhiyao Luo, Mingcheng Zhu, Fenglin Liu, Jiali Li, Yangchen Pan, Jiandong Zhou, Tingting Zhu
Our experiments reveal varying degrees of performance degradation among RL algorithms in the presence of noise and patient variability, with some algorithms failing to converge.
no code implementations • 23 Apr 2024 • Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr
In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i. i. d.)
no code implementations • 17 Mar 2024 • Yudong Luo, Yangchen Pan, Han Wang, Philip Torr, Pascal Poupart
Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their practical applications.
1 code implementation • 30 Nov 2023 • Avery Ma, Amir-Massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu
During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss.
1 code implementation • 13 Aug 2023 • Avery Ma, Yangchen Pan, Amir-Massoud Farahmand
In the context of deep learning, our experiments show that SGD-trained neural networks have smaller Lipschitz constants, explaining the better robustness to input perturbations than those trained with adaptive gradient methods.
1 code implementation • 16 Mar 2023 • Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar, Janarthanan Rajendran
Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL).
4 code implementations • 28 Feb 2023 • Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White
We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.
1 code implementation • 27 Nov 2022 • Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip H. S. Torr, Yangchen Pan
Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix.
1 code implementation • 22 May 2022 • Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood
The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later.
no code implementations • 24 Jan 2022 • Liangliang Xu, Daoming Lyu, Yangchen Pan, Aiwen Jiang, Bo Liu
This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories.
1 code implementation • 22 Dec 2021 • Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood
Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions.
1 code implementation • 28 Sep 2020 • Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao
The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.
2 code implementations • 19 Jul 2020 • Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo
Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.
1 code implementation • ICLR 2020 • Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White
Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.
no code implementations • NeurIPS 2020 • Yangchen Pan, Ehsan Imani, Martha White, Amir-Massoud Farahmand
We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions.
no code implementations • ICLR 2020 • Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand
This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples.
1 code implementation • ICLR 2021 • Yangchen Pan, Kirby Banman, Martha White
Recent work has shown that sparse representations -- where only a small percentage of units are active -- can significantly reduce interference.
no code implementations • 18 Jun 2019 • Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White
In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.
Model-based Reinforcement Learning
Reinforcement Learning
+1
1 code implementation • 22 Oct 2018 • Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White
We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.
no code implementations • ICML 2018 • Yangchen Pan, Amir-Massoud Farahmand, Martha White, Saleh Nabi, Piyush Grover, Daniel Nikovski
Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE).
no code implementations • 12 Jun 2018 • Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White
We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.
no code implementations • 3 Aug 2017 • Yangchen Pan, Erfan Sadeqi Azer, Martha White
As a remedy, we demonstrate how to use sketching more sparingly, with only a left-sided sketch, that can still enable significant computational gains and the use of these matrix-based learning algorithms that are less sensitive to parameters.
no code implementations • ICML 2017 • Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White
In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion.
no code implementations • 28 Nov 2016 • Yangchen Pan, Adam White, Martha White
The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods.
no code implementations • 26 Nov 2015 • Clement Gehring, Yangchen Pan, Martha White
Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning.