no code implementations • 15 Feb 2024 • Dexun Li, Cong Zhang, Kuicai Dong, Derrick Goh Xin Deik, Ruiming Tang, Yong liu
In this paper, we introduce the Distributional Preference Reward Model (DPRM), a simple yet effective framework to align large language models with a diverse set of human preferences.
no code implementations • 30 Sep 2023 • Dexun Li, Pradeep Varakantham
Unsupervised Environment Design (UED) is a paradigm for automatically generating a curriculum of training environments, enabling agents trained in these environments to develop general capabilities, i. e., achieving good zero-shot transfer performance.
no code implementations • 4 Feb 2023 • Dexun Li, Wenjun Li, Pradeep Varakantham
In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework.
no code implementations • 19 Jan 2023 • Wenjun Li, Pradeep Varakantham, Dexun Li
Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e. g., moving in an 8x8 maze with three rooms, playing Chess on an 8x8 board).
no code implementations • 18 Dec 2022 • Dexun Li
Using historical data to predict future events has many applications in the real world, such as stock price prediction; the robot localization.
no code implementations • 27 Jul 2022 • Dexun Li, Pradeep Varakantham
To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint and then provide an approach to enforce the soft fairness constraint in RMABs.
no code implementations • 8 Jun 2022 • Dexun Li, Pradeep Varakantham
In this paper, we are interested in ensuring that RMAB decision making is also fair to different arms while maximizing expected value.
no code implementations • 8 Jul 2021 • Dexun Li, Meghna Lowalekar, Pradeep Varakantham
Influence maximization is the problem of finding a small subset of nodes in a network that can maximize the diffusion of information.