Search Results for author: Dexun Li

Found 8 papers, 0 papers with code

Aligning Crowd Feedback via Distributional Preference Reward Modeling

no code implementations15 Feb 2024 Dexun Li, Cong Zhang, Kuicai Dong, Derrick Goh Xin Deik, Ruiming Tang, Yong liu

In this paper, we introduce the Distributional Preference Reward Model (DPRM), a simple yet effective framework to align large language models with a diverse set of human preferences.

Enhancing the Hierarchical Environment Design via Generative Trajectory Modeling

no code implementations30 Sep 2023 Dexun Li, Pradeep Varakantham

Unsupervised Environment Design (UED) is a paradigm for automatically generating a curriculum of training environments, enabling agents trained in these environments to develop general capabilities, i. e., achieving good zero-shot transfer performance.

Trajectory Modeling

Diversity Induced Environment Design via Self-Play

no code implementations4 Feb 2023 Dexun Li, Wenjun Li, Pradeep Varakantham

In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework.

Generalization through Diversity: Improving Unsupervised Environment Design

no code implementations19 Jan 2023 Wenjun Li, Pradeep Varakantham, Dexun Li

Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e. g., moving in an 8x8 maze with three rooms, playing Chess on an 8x8 board).

Decision Making Reinforcement Learning (RL)

Hidden State Approximation in Recurrent Neural Networks Using Continuous Particle Filtering

no code implementations18 Dec 2022 Dexun Li

Using historical data to predict future events has many applications in the real world, such as stock price prediction; the robot localization.

Stock Price Prediction

Towards Soft Fairness in Restless Multi-Armed Bandits

no code implementations27 Jul 2022 Dexun Li, Pradeep Varakantham

To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint and then provide an approach to enforce the soft fairness constraint in RMABs.

Fairness Multi-Armed Bandits

Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits

no code implementations8 Jun 2022 Dexun Li, Pradeep Varakantham

In this paper, we are interested in ensuring that RMAB decision making is also fair to different arms while maximizing expected value.

Decision Making Fairness +1

CLAIM: Curriculum Learning Policy for Influence Maximization in Unknown Social Networks

no code implementations8 Jul 2021 Dexun Li, Meghna Lowalekar, Pradeep Varakantham

Influence maximization is the problem of finding a small subset of nodes in a network that can maximize the diffusion of information.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.