Search Results for author: Jiahao Qiu

Found 3 papers, 0 papers with code

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

no code implementations14 Feb 2024 Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.

Fairness reinforcement-learning

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

no code implementations8 Jan 2024 Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang

To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model.

Cannot find the paper you are looking for? You can Submit a new open access paper.