Search Results for author: Hongyi Guo

Found 12 papers, 5 papers with code

Toward Optimal LLM Alignments Using Two-Player Games

1 code implementation16 Jun 2024 Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu

We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.

reinforcement-learning Reinforcement Learning

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

no code implementations26 May 2024 Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang

To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model; one that simultaneously minimizes the maximum likelihood estimation of the loss and a reward penalty term.

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

no code implementations9 Apr 2024 Xudong Yu, Chenjia Bai, Hongyi Guo, Changhong Wang, Zhen Wang

Offline Reinforcement Learning (RL) faces distributional shift and unreliable value estimation, especially for out-of-distribution (OOD) actions.

Diversity Reinforcement Learning (RL) +1

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

no code implementations12 Mar 2024 Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences.

reinforcement-learning Reinforcement Learning

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

no code implementations8 Mar 2024 Hongyi Guo, Zhihan Liu, Yufeng Zhang, Zhaoran Wang

Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge.

Decision Making Hallucination

Human-Instruction-Free LLM Self-Alignment with Limited Samples

no code implementations6 Jan 2024 Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu

The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples.

In-Context Learning Instruction Following

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

1 code implementation29 Sep 2023 Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang

Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon ("reason for future").

Behavior Contrastive Learning for Unsupervised Skill Discovery

1 code implementation8 May 2023 Rushuai Yang, Chenjia Bai, Hongyi Guo, Siyuan Li, Bin Zhao, Zhen Wang, Peng Liu, Xuelong Li

Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill, which serves as an upper bound of the previous MI objective.

Continuous Control Contrastive Learning

Policy Learning Using Weak Supervision

1 code implementation NeurIPS 2021 Jingkang Wang, Hongyi Guo, Zhaowei Zhu, Yang Liu

Most existing policy learning solutions require the learning agents to receive high-quality supervision signals such as well-designed rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavioral cloning (BC).

Reinforcement Learning (RL)

Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates

2 code implementations ICML 2020 Yang Liu, Hongyi Guo

In this work, we introduce a new family of loss functions that we name as peer loss functions, which enables learning from noisy labels and does not require a priori specification of the noise rates.

Learning with noisy labels

Signal Instructed Coordination in Cooperative Multi-agent Reinforcement Learning

no code implementations10 Sep 2019 Liheng Chen, Hongyi Guo, Yali Du, Fei Fang, Haifeng Zhang, Yaoming Zhu, Ming Zhou, Wei-Nan Zhang, Qing Wang, Yong Yu

Although existing works formulate this problem into a centralized learning with decentralized execution framework, which avoids the non-stationary problem in training, their decentralized execution paradigm limits the agents' capability to coordinate.

Multi-agent Reinforcement Learning reinforcement-learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.