no code implementations • 18 Apr 2023 • Dingwen Kong, Lin F. Yang
We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs.
no code implementations • 20 Oct 2022 • Yuanhao Wang, Dingwen Kong, Yu Bai, Chi Jin
This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE) whose sample complexities are polynomial in all problem parameters including the number of players.
no code implementations • 14 Jun 2021 • Dingwen Kong, Ruslan Salakhutdinov, Ruosong Wang, Lin F. Yang
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound.