Search Results for author: Wanqiao Xu

Found 6 papers, 2 papers with code

Pearl: A Production-ready Reinforcement Learning Agent

1 code implementation • 6 Dec 2023 • Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals.

reinforcement-learning Reinforcement Learning (RL)

2,367

Paper
Code

RLHF and IIA: Perverse Incentives

no code implementations • 2 Dec 2023 • Wanqiao Xu, Shi Dong, Xiuyuan Lu, Grace Lam, Zheng Wen, Benjamin Van Roy

Existing algorithms for reinforcement learning from human feedback (RLHF) can incentivize responses at odds with preferences because they are based on models that assume independence of irrelevant alternatives (IIA).

reinforcement-learning

Paper
Add Code

Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models

no code implementations • 19 May 2023 • Wanqiao Xu, Shi Dong, Dilip Arumugam, Benjamin Van Roy

In this work, we adopt a novel perspective wherein a pre-trained language model is itself simultaneously a policy, reward function, and transition function.

Efficient Exploration Language Modelling +2

Paper
Add Code

Posterior Sampling for Continuing Environments

no code implementations • 29 Nov 2022 • Wanqiao Xu, Shi Dong, Benjamin Van Roy

We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments.

Paper
Add Code

Uniformly Conservative Exploration in Reinforcement Learning

1 code implementation • 25 Oct 2021 • Wanqiao Xu, Jason Yecheng Ma, Kan Xu, Hamsa Bastani, Osbert Bastani

A key challenge to deploying reinforcement learning in practice is avoiding excessive (harmful) exploration in individual episodes.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Distribution of Eigenvalues of Matrix Ensembles arising from Wigner and Palindromic Toeplitz Blocks

no code implementations • 11 Feb 2021 • Keller Blackwell, Neelima Borade, Arup Bose, Charles Devlin VI, Noah Luntzlara, Renyuan Ma, Steven J. Miller, Soumendu Sundar Mukherjee, Mengxi Wang, Wanqiao Xu

For definiteness we concentrate on the ensemble of palindromic real symmetric Toeplitz (PST) matrices and the ensemble of real symmetric matrices, whose limiting spectral measures are the Gaussian and semi-circular distributions, respectively; these were chosen as they are the two extreme cases in terms of moment calculations.

Probability 15A52 (primary), 60F99, 62H10 (secondary)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.