Search Results for author: Runlong Zhou

Found 6 papers, 2 papers with code

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

no code implementations20 Feb 2024 Runlong Zhou, Simon S. Du, Beibin Li

We propose Reflect-RL, a two-player system to fine-tune an LM using online RL, where a frozen reflection model assists the policy model.

Decision Making Reinforcement Learning (RL)

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

1 code implementation30 Oct 2023 Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du

Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be important in sequential decision-making problems.

Decision Making Offline RL +1

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

no code implementations31 Jan 2023 Runlong Zhou, Zihan Zhang, Simon S. Du

We further initiate the study on model-free algorithms with variance-dependent regret bounds by designing a reference-function-based algorithm with a novel capped-doubling reference update schedule.

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes

no code implementations20 Oct 2022 Runlong Zhou, Ruosong Wang, Simon S. Du

We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs.

reinforcement-learning Reinforcement Learning (RL)

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

1 code implementation11 Feb 2022 Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du

Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift, a critical quantity that governs the convergence rate in our theorem.

Combinatorial Optimization Reinforcement Learning (RL)

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

no code implementations NeurIPS 2021 Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.

Cannot find the paper you are looking for? You can Submit a new open access paper.