Search Results for author: Yazhe Niu

Found 6 papers, 5 papers with code

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

1 code implementation25 Apr 2024 Chunyu Xuan, Yazhe Niu, Yuan Pu, Shuai Hu, Yu Liu, Jing Yang

MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains.

Board Games Decision Making

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

1 code implementation12 Dec 2023 Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining.

Offline RL

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

1 code implementation NeurIPS 2023 Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu

Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari.

Board Games Decision Making

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

no code implementations24 Jul 2023 Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.

Continuous Control Model-based Reinforcement Learning +1

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

1 code implementation29 Nov 2022 Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.

Decision Making Q-Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.