no code implementations • 30 Apr 2024 • Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman Ozdaglar
We propose two approaches based on reward and preference aggregation, respectively: the former utilizes both utilitarianism and Leximin approaches to aggregate individual reward models, with sample complexity guarantees; the latter directly aggregates the human feedback in the form of probabilistic opinions.
no code implementations • 18 Apr 2023 • Dingwen Kong, Lin F. Yang
We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs.
no code implementations • 20 Oct 2022 • Yuanhao Wang, Dingwen Kong, Yu Bai, Chi Jin
This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE) whose sample complexities are polynomial in all problem parameters including the number of players.
no code implementations • 14 Jun 2021 • Dingwen Kong, Ruslan Salakhutdinov, Ruosong Wang, Lin F. Yang
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound.