Search Results for author: Wanqiao Xu

Found 6 papers, 2 papers with code

RLHF and IIA: Perverse Incentives

no code implementations2 Dec 2023 Wanqiao Xu, Shi Dong, Xiuyuan Lu, Grace Lam, Zheng Wen, Benjamin Van Roy

Existing algorithms for reinforcement learning from human feedback (RLHF) can incentivize responses at odds with preferences because they are based on models that assume independence of irrelevant alternatives (IIA).


Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models

no code implementations19 May 2023 Wanqiao Xu, Shi Dong, Dilip Arumugam, Benjamin Van Roy

In this work, we adopt a novel perspective wherein a pre-trained language model is itself simultaneously a policy, reward function, and transition function.

Efficient Exploration Language Modelling +2

Posterior Sampling for Continuing Environments

no code implementations29 Nov 2022 Wanqiao Xu, Shi Dong, Benjamin Van Roy

We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments.

Uniformly Conservative Exploration in Reinforcement Learning

1 code implementation25 Oct 2021 Wanqiao Xu, Jason Yecheng Ma, Kan Xu, Hamsa Bastani, Osbert Bastani

A key challenge to deploying reinforcement learning in practice is avoiding excessive (harmful) exploration in individual episodes.

reinforcement-learning Reinforcement Learning (RL)

Distribution of Eigenvalues of Matrix Ensembles arising from Wigner and Palindromic Toeplitz Blocks

no code implementations11 Feb 2021 Keller Blackwell, Neelima Borade, Arup Bose, Charles Devlin VI, Noah Luntzlara, Renyuan Ma, Steven J. Miller, Soumendu Sundar Mukherjee, Mengxi Wang, Wanqiao Xu

For definiteness we concentrate on the ensemble of palindromic real symmetric Toeplitz (PST) matrices and the ensemble of real symmetric matrices, whose limiting spectral measures are the Gaussian and semi-circular distributions, respectively; these were chosen as they are the two extreme cases in terms of moment calculations.

Probability 15A52 (primary), 60F99, 62H10 (secondary)

Cannot find the paper you are looking for? You can Submit a new open access paper.