Search Results for author: Jincheng Mei

Found 17 papers, 3 papers with code

Stochastic Gradient Succeeds for Bandits

no code implementations • 27 Feb 2024 • Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size.

Paper
Add Code

Beyond Expectations: Learning with Stochastic Dominance Made Practical

no code implementations • 5 Feb 2024 • Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations.

Decision Making Portfolio Optimization

Paper
Add Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

The Role of Baselines in Policy Gradient Optimization

no code implementations • 16 Jan 2023 • Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance.

Paper
Add Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Understanding the Effect of Stochasticity in Policy Optimization

no code implementations • NeurIPS 2021 • Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions.

Paper
Add Code

Understanding and Leveraging Overparameterization in Recursive Value Estimation

no code implementations • ICLR 2022 • Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans

To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.

Reinforcement Learning (RL) Value prediction

Paper
Add Code

Leveraging Non-uniformity in First-order Non-convex Optimization

no code implementations • 13 May 2021 • Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Classical global convergence results for first-order methods rely on uniform smoothness and the \L{}ojasiewicz inequality.

BIG-bench Machine Learning

Paper
Add Code

On the Optimality of Batch Policy Optimization Algorithms

no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Paper
Add Code

Escaping the Gravitational Pull of Softmax

no code implementations • NeurIPS 2020 • Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.

Paper
Add Code

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

1 code implementation • 28 Sep 2020 • Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao

The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.

2,353

Paper
Code

Understanding and Mitigating the Limitations of Prioritized Experience Replay

2 code implementations • 19 Jul 2020 • Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.

Autonomous Driving Continuous Control +1

2,353

Paper
Code

On the Global Convergence Rates of Softmax Policy Gradient Methods

no code implementations • ICML 2020 • Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.

Open-Ended Question Answering Policy Gradient Methods

Paper
Add Code

Frequency-based Search-control in Dyna

no code implementations • ICLR 2020 • Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand

This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples.

Model-based Reinforcement Learning

Paper
Add Code

Maximum Entropy Monte-Carlo Planning

no code implementations • NeurIPS 2019 • Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller

We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).

Atari Games Decision Making

Paper
Add Code

Identifying and Tracking Sentiments and Topics from Social Media Texts during Natural Disasters

no code implementations • EMNLP 2017 • Min Yang, Jincheng Mei, Heng Ji, Wei Zhao, Zhou Zhao, Xiaojun Chen

We study the problem of identifying the topics and sentiments and tracking their shifts from social media texts in different geographical regions during emergencies and disasters.

Topic Models

Paper
Add Code

On the Reducibility of Submodular Functions

no code implementations • 4 Jan 2016 • Jincheng Mei, Hao Zhang, Bao-liang Lu

The scalability of submodular optimization methods is critical for their usability in practice.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.