Search Results for author: Shenao Zhang

Found 9 papers, 5 papers with code

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

1 code implementation29 May 2024 Shenao Zhang, Donghan Yu, Hiteshi Sharma, ZiYi Yang, Shuohang Wang, Hany Hassan, Zhaoran Wang

Preference optimization, particularly through Reinforcement Learning from Human Feedback (RLHF), has achieved significant success in aligning Large Language Models (LLMs) to adhere to human intentions.

Instruction Following

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

no code implementations26 May 2024 Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang

To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model; one that simultaneously minimizes the maximum likelihood estimation of the loss and a reward penalty term.

How Can LLM Guide RL? A Value-Based Approach

1 code implementation25 Feb 2024 Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang

Specifically, we develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning, particularly when the difference between the ideal policy and the LLM-informed policy is small, which suggests that the initial policy is close to optimal, reducing the need for further exploration.

Decision Making Reinforcement Learning (RL)

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

1 code implementation29 Sep 2023 Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang

Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon ("reason for future").

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

1 code implementation NeurIPS 2023 Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration.

Asking Before Acting: Gather Information in Embodied Decision Making with Language Models

no code implementations25 May 2023 Xiaoyu Chen, Shenao Zhang, Pushi Zhang, Li Zhao, Jianyu Chen

With strong capabilities of reasoning and a broad understanding of the world, Large Language Models (LLMs) have demonstrated immense potential in building versatile embodied decision-making agents capable of executing a wide array of tasks.

Imitation Learning

Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

no code implementations16 Sep 2022 Shenao Zhang

In this work, we propose Conservative Dual Policy Optimization (CDPO) that involves a Referential Update and a Conservative Update.

Model-based Reinforcement Learning reinforcement-learning +1

Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

no code implementations30 Aug 2021 Shenao Zhang, Lei Han, Li Shen

In multi-agent reinforcement learning, the behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.

Multi-agent Reinforcement Learning reinforcement-learning +1

Structure-Regularized Attention for Deformable Object Representation

1 code implementation12 Jun 2021 Shenao Zhang, Li Shen, Zhifeng Li, Wei Liu

Capturing contextual dependencies has proven useful to improve the representational power of deep neural networks.

Object

Cannot find the paper you are looking for? You can Submit a new open access paper.