Search Results for author: Sirui Zheng

Found 4 papers, 3 papers with code

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

1 code implementation1 Oct 2024 Xuwu Wang, Qiwen Cui, Yunzhe Tao, Yiran Wang, Ziwei Chai, Xiaotian Han, Boyi Liu, Jianbo Yuan, Jing Su, Guoyin Wang, Tingkai Liu, Liyu Chen, Tianyi Liu, Tao Sun, Yufeng Zhang, Sirui Zheng, Quanzeng You, Yang Yang, Hongxia Yang

BabelBench incorporates a dataset comprising 247 meticulously curated problems that challenge the models with tasks in perception, commonsense reasoning, logical reasoning, and so on.

How Can LLM Guide RL? A Value-Based Approach

1 code implementation25 Feb 2024 Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang

Specifically, we develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning, particularly when the difference between the ideal policy and the LLM-informed policy is small, which suggests that the initial policy is close to optimal, reducing the need for further exploration.

Decision Making Reinforcement Learning (RL)

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

1 code implementation NeurIPS 2023 Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration.

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

no code implementations3 Nov 2022 Han Zhong, Wei Xiong, Sirui Zheng, LiWei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang

The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning.

Decision Making Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.