1 code implementation • 6 Dec 2023 • Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu
Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals.
no code implementations • 2 Dec 2023 • Wanqiao Xu, Shi Dong, Xiuyuan Lu, Grace Lam, Zheng Wen, Benjamin Van Roy
Existing algorithms for reinforcement learning from human feedback (RLHF) can incentivize responses at odds with preferences because they are based on models that assume independence of irrelevant alternatives (IIA).
no code implementations • 19 May 2023 • Wanqiao Xu, Shi Dong, Dilip Arumugam, Benjamin Van Roy
In this work, we adopt a novel perspective wherein a pre-trained language model is itself simultaneously a policy, reward function, and transition function.
no code implementations • 29 Nov 2022 • Wanqiao Xu, Shi Dong, Benjamin Van Roy
We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments.
1 code implementation • 25 Oct 2021 • Wanqiao Xu, Jason Yecheng Ma, Kan Xu, Hamsa Bastani, Osbert Bastani
A key challenge to deploying reinforcement learning in practice is avoiding excessive (harmful) exploration in individual episodes.
no code implementations • 11 Feb 2021 • Keller Blackwell, Neelima Borade, Arup Bose, Charles Devlin VI, Noah Luntzlara, Renyuan Ma, Steven J. Miller, Soumendu Sundar Mukherjee, Mengxi Wang, Wanqiao Xu
For definiteness we concentrate on the ensemble of palindromic real symmetric Toeplitz (PST) matrices and the ensemble of real symmetric matrices, whose limiting spectral measures are the Gaussian and semi-circular distributions, respectively; these were chosen as they are the two extreme cases in terms of moment calculations.
Probability 15A52 (primary), 60F99, 62H10 (secondary)