no code implementations • 25 Jun 2023 • Yuanhao Wang, Qinghua Liu, Chi Jin
This paper theoretically proves that, for a wide range of preference models, we can solve preference-based RL directly using existing algorithms and techniques for reward-based RL, with small or no extra costs.
no code implementations • NeurIPS 2023 • Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári
While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary.
no code implementations • 13 Feb 2023 • Yuanhao Wang, Qinghua Liu, Yu Bai, Chi Jin
A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the curse of multiagency, where the description length of the game as well as the complexity of many existing learning algorithms scale exponentially with the number of agents.
no code implementations • 29 Sep 2022 • Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvári, Chi Jin
We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples.
1 code implementation • 25 Jul 2022 • Qinghua Liu, Yuxiang Jiang
We summarize the existing training methodologies into three main categories: training parallelism, memory-saving technologies, and model sparsity design.
2 code implementations • 18 Jul 2022 • Zihan Ding, DiJia Su, Qinghua Liu, Chi Jin
This paper proposes new, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games.
no code implementations • 6 Jun 2022 • Runyu Zhang, Qinghua Liu, Huan Wang, Caiming Xiong, Na Li, Yu Bai
Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an $\mathcal{\widetilde{O}}(T^{-5/6})$ approximate NE in $T$ iterations, and a similar algorithm with slightly modified value update rule achieves a faster $\mathcal{\widetilde{O}}(T^{-1})$ convergence rate.
no code implementations • 2 Jun 2022 • Qinghua Liu, Csaba Szepesvári, Chi Jin
This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 19 Apr 2022 • Qinghua Liu, Alan Chung, Csaba Szepesvári, Chi Jin
Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous.
Partially Observable Reinforcement Learning reinforcement-learning +1
no code implementations • 14 Mar 2022 • Qinghua Liu, Yuanhao Wang, Chi Jin
When the policies of the opponents are not revealed, we prove a statistical hardness result even in the most favorable scenario when both above conditions are true.
1 code implementation • 7 Nov 2021 • Qinghua Liu, Yating Huang, Yunzhe Hao, Jiaming Xu, Bo Xu
Multi-modal cues, including spatial information, facial expression and voiceprint, are introduced to the speech separation and speaker extraction tasks to serve as complementary information to achieve better performance.
no code implementations • 27 Oct 2021 • Chi Jin, Qinghua Liu, Yuanhao Wang, Tiancheng Yu
We design a new class of fully decentralized algorithms -- V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with $\max_{i\in[m]} A_i$, where $A_i$ is the number of actions for the $i^{\rm th}$ player.
no code implementations • 7 Jun 2021 • Chi Jin, Qinghua Liu, Tiancheng Yu
Modern reinforcement learning (RL) commonly engages practical problems with large state spaces, where function approximation must be deployed to approximate either the value function or the policy.
no code implementations • NeurIPS 2021 • Chi Jin, Qinghua Liu, Sobhan Miryoosefi
Finding the minimal structural assumptions that empower sample-efficient learning is one of the most important research directions in Reinforcement Learning (RL).
no code implementations • ICLR 2021 • Dipendra Misra, Qinghua Liu, Chi Jin, John Langford
We propose a novel setting for reinforcement learning that combines two common real-world difficulties: presence of observations (such as camera images) and factored states (such as location of objects).
no code implementations • 24 Dec 2020 • Qinghua Liu, Zhou Lu
In this paper we fill the gap by proving a tight generalization lower bound of order $\Omega(\gamma+\frac{L}{\sqrt{n}})$, which matches the best known upper bound up to logarithmic factors
no code implementations • 4 Oct 2020 • Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches.
Model-based Reinforcement Learning Multi-agent Reinforcement Learning +2
1 code implementation • NeurIPS 2020 • Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, H. Vincent Poor
In federated optimization, heterogeneity in the clients' local datasets and computation speeds results in large variations in the number of local updates performed by each client in each communication round.
no code implementations • NeurIPS 2020 • Chi Jin, Sham M. Kakade, Akshay Krishnamurthy, Qinghua Liu
Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration.
no code implementations • 30 Jan 2018 • Gen Li, Qinghua Liu, Yuantao Gu
As an analogy to JL Lemma and RIP for sparse vectors, this work allows the use of random projections to reduce the ambient dimension with the theoretical guarantee that the distance between subspaces after compression is well preserved.