Search Results for author: Wenhao Zhan

Found 9 papers, 1 papers with code

Dataset Reset Policy Optimization for RLHF

2 code implementations12 Apr 2024 Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.

Reinforcement Learning (RL)

Optimal Multi-Distribution Learning

no code implementations8 Dec 2023 Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du, Jason D. Lee

Focusing on a hypothesis class of Vapnik-Chervonenkis (VC) dimension $d$, we propose a novel algorithm that yields an $varepsilon$-optimal randomized hypothesis with a sample complexity on the order of $(d+k)/\varepsilon^2$ (modulo some logarithmic factor), matching the best-known lower bound.

Fairness

Provable Reward-Agnostic Preference-Based Reinforcement Learning

no code implementations29 May 2023 Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals.

reinforcement-learning

Provable Offline Preference-Based Reinforcement Learning

no code implementations24 May 2023 Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE.

reinforcement-learning

PAC Reinforcement Learning for Predictive State Representations

no code implementations12 Jul 2022 Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces.

reinforcement-learning Reinforcement Learning (RL)

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games

no code implementations3 Jun 2022 Wenhao Zhan, Jason D. Lee, Zhuoran Yang

We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents.

Decision Making

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

no code implementations9 Feb 2022 Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).

Offline RL reinforcement-learning +1

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

no code implementations24 May 2021 Wenhao Zhan, Shicong Cen, Baihe Huang, Yuxin Chen, Jason D. Lee, Yuejie Chi

These can often be accounted for via regularized RL, which augments the target value function with a structure-promoting regularizer.

Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.