Search Results for author: Jincheng Mei

Found 19 papers, 3 papers with code

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

no code implementations31 May 2024 Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A. Rupam Mahmood, Dale Schuurmans

We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data.


Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

no code implementations29 May 2024 Shicong Cen, Jincheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi, Bo Dai

A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF, regardless of how the preference data is collected.

reinforcement-learning Reinforcement Learning (RL) +1

Stochastic Gradient Succeeds for Bandits

no code implementations27 Feb 2024 Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size.

Beyond Expectations: Learning with Stochastic Dominance Made Practical

no code implementations5 Feb 2024 Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations.

Decision Making Portfolio Optimization

The Role of Baselines in Policy Gradient Optimization

no code implementations16 Jan 2023 Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance.

Understanding the Effect of Stochasticity in Policy Optimization

no code implementations NeurIPS 2021 Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions.

Understanding and Leveraging Overparameterization in Recursive Value Estimation

no code implementations ICLR 2022 Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans

To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.

Reinforcement Learning (RL) Value prediction

Leveraging Non-uniformity in First-order Non-convex Optimization

no code implementations13 May 2021 Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Classical global convergence results for first-order methods rely on uniform smoothness and the \L{}ojasiewicz inequality.

BIG-bench Machine Learning

On the Optimality of Batch Policy Optimization Algorithms

no code implementations6 Apr 2021 Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Escaping the Gravitational Pull of Softmax

no code implementations NeurIPS 2020 Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

1 code implementation28 Sep 2020 Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao

The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.

Understanding and Mitigating the Limitations of Prioritized Experience Replay

2 code implementations19 Jul 2020 Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.

Autonomous Driving Continuous Control +1

On the Global Convergence Rates of Softmax Policy Gradient Methods

no code implementations ICML 2020 Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.

Open-Ended Question Answering Policy Gradient Methods

Frequency-based Search-control in Dyna

no code implementations ICLR 2020 Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand

This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples.

Model-based Reinforcement Learning

Maximum Entropy Monte-Carlo Planning

no code implementations NeurIPS 2019 Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller

We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).

Atari Games Decision Making

Identifying and Tracking Sentiments and Topics from Social Media Texts during Natural Disasters

no code implementations EMNLP 2017 Min Yang, Jincheng Mei, Heng Ji, Wei Zhao, Zhou Zhao, Xiaojun Chen

We study the problem of identifying the topics and sentiments and tracking their shifts from social media texts in different geographical regions during emergencies and disasters.

Topic Models

On the Reducibility of Submodular Functions

no code implementations4 Jan 2016 Jincheng Mei, Hao Zhang, Bao-liang Lu

The scalability of submodular optimization methods is critical for their usability in practice.

Cannot find the paper you are looking for? You can Submit a new open access paper.