no code implementations • ICML 2020 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu
We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.
no code implementations • 25 May 2023 • Ahmed Khaled, Konstantin Mishchenko, Chi Jin
It is also the first parameter-free AdaGrad style algorithm that adapts to smooth optimization.
no code implementations • 18 May 2023 • Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári
While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary.
no code implementations • 10 Apr 2023 • Zihan Ding, Yuanpei Chen, Allen Z. Ren, Shixiang Shane Gu, Hao Dong, Chi Jin
Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands.
no code implementations • 2 Mar 2023 • Jiawei Ge, Shange Tang, Jianqing Fan, Chi Jin
Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems.
no code implementations • 13 Feb 2023 • Yuanhao Wang, Qinghua Liu, Yu Bai, Chi Jin
A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the curse of multiagency, where the description length of the game as well as the complexity of many existing learning algorithms scale exponentially with the number of agents.
no code implementations • 9 Feb 2023 • Hadi Daneshmand, Jason D. Lee, Chi Jin
Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures.
no code implementations • 30 Oct 2022 • Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang
To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.
no code implementations • 27 Oct 2022 • Jiachen Hu, Han Zhong, Chi Jin, LiWei Wang
Sim-to-real transfer trains RL agents in the simulated environments and then deploys them in the real world.
no code implementations • 20 Oct 2022 • Yuanhao Wang, Dingwen Kong, Yu Bai, Chi Jin
This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE) whose sample complexities are polynomial in all problem parameters including the number of players.
no code implementations • 29 Sep 2022 • Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvári, Chi Jin
We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples.
no code implementations • 6 Sep 2022 • Ahmed Khaled, Chi Jin
Federated learning (FL) is a subfield of machine learning where multiple clients try to collaboratively learn a model over a network under communication constraints.
2 code implementations • 18 Jul 2022 • Zihan Ding, DiJia Su, Qinghua Liu, Chi Jin
This paper proposes new, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games.
no code implementations • 2 Jun 2022 • Qinghua Liu, Csaba Szepesvári, Chi Jin
This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 30 May 2022 • Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu
A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs).
no code implementations • 19 Apr 2022 • Qinghua Liu, Alan Chung, Csaba Szepesvári, Chi Jin
Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous.
Partially Observable Reinforcement Learning
reinforcement-learning
+1
no code implementations • 14 Mar 2022 • Qinghua Liu, Yuanhao Wang, Chi Jin
When the policies of the opponents are not revealed, we prove a statistical hardness result even in the most favorable scenario when both above conditions are true.
no code implementations • 8 Feb 2022 • Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi
Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions.
no code implementations • 3 Feb 2022 • Yu Bai, Chi Jin, Song Mei, Tiancheng Yu
This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors.
no code implementations • 23 Dec 2021 • Bowen Yi, Chi Jin, Ian R. Manchester
The design of a globally convergent position observer for feature points from visual information is a challenging problem, especially for the case with only inertial measurements and without assumptions of uniform observability, which remained open for a long time.
no code implementations • 27 Oct 2021 • Chi Jin, Qinghua Liu, Yuanhao Wang, Tiancheng Yu
We design a new class of fully decentralized algorithms -- V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with $\max_{i\in[m]} A_i$, where $A_i$ is the number of actions for the $i^{\rm th}$ player.
no code implementations • ICLR 2022 • Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, LiWei Wang
Reinforcement learning encounters many challenges when applied directly in the real world.
no code implementations • 12 Jul 2021 • Sobhan Miryoosefi, Chi Jin
In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the overall reward but also satisfy the additional safety, diversity, or budget constraints.
no code implementations • 7 Jun 2021 • Chi Jin, Qinghua Liu, Tiancheng Yu
Modern reinforcement learning (RL) commonly engages practical problems with large state spaces, where function approximation must be deployed to approximate either the value function or the policy.
1 code implementation • ICLR 2022 • Tanner Fiez, Chi Jin, Praneeth Netrapalli, Lillian J. Ratliff
This paper considers minimax optimization $\min_x \max_y f(x, y)$ in the challenging setting where $f$ can be both nonconvex in $x$ and nonconcave in $y$.
no code implementations • 25 Mar 2021 • Yaqi Duan, Chi Jin, Zhiyuan Li
Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class.
no code implementations • NeurIPS 2021 • Yu Bai, Chi Jin, Huan Wang, Caiming Xiong
Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum.
no code implementations • 8 Feb 2021 • Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, LiWei Wang
This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation.
no code implementations • 4 Feb 2021 • Mo Zhou, Rong Ge, Chi Jin
We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parameterized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0.
no code implementations • NeurIPS 2021 • Chi Jin, Qinghua Liu, Sobhan Miryoosefi
Finding the minimal structural assumptions that empower sample-efficient learning is one of the most important research directions in Reinforcement Learning (RL).
no code implementations • ICLR 2021 • Dipendra Misra, Qinghua Liu, Chi Jin, John Langford
We propose a novel setting for reinforcement learning that combines two common real-world difficulties: presence of observations (such as camera images) and factored states (such as location of objects).
no code implementations • NeurIPS 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan
Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.
no code implementations • 9 Nov 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan
The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.
no code implementations • 4 Oct 2020 • Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches.
Model-based Reinforcement Learning
Multi-agent Reinforcement Learning
+2
no code implementations • NeurIPS 2020 • Chi Jin, Sham M. Kakade, Akshay Krishnamurthy, Qinghua Liu
Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration.
no code implementations • NeurIPS 2020 • Yu Bai, Chi Jin, Tiancheng Yu
This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games.
no code implementations • NeurIPS 2020 • Nilesh Tripuraneni, Michael. I. Jordan, Chi Jin
Formally, we consider $t+1$ tasks parameterized by functions of the form $f_j \circ h$ in a general function class $\mathcal{F} \circ \mathcal{H}$, where each $f_j$ is a task-specific function in $\mathcal{F}$ and $h$ is the shared representation in $\mathcal{H}$.
1 code implementation • 26 Feb 2020 • Nilesh Tripuraneni, Chi Jin, Michael. I. Jordan
In this paper, we focus on the problem of multi-task linear regression -- in which multiple linear regression models share a common, low-dimensional linear representation.
no code implementations • ICML 2020 • Yu Bai, Chi Jin
We introduce a self-play algorithm---Value Iteration with Upper/Lower Confidence Bound (VI-ULCB)---and show that it achieves regret $\tilde{\mathcal{O}}(\sqrt{T})$ after playing $T$ steps of the game, where the regret is measured by the agent's performance against a \emph{fully adversarial} opponent who can exploit the agent's strategy at \emph{any} step.
no code implementations • ICML 2020 • Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu
We give an efficient algorithm that conducts $\tilde{\mathcal{O}}(S^2A\mathrm{poly}(H)/\epsilon^2)$ episodes of exploration and returns $\epsilon$-suboptimal policies for an arbitrary number of reward functions.
no code implementations • 5 Feb 2020 • Tianyi Lin, Chi Jin, Michael. I. Jordan
This paper presents the first algorithm with $\tilde{O}(\sqrt{\kappa_{\mathbf x}\kappa_{\mathbf y}})$ gradient complexity, matching the lower bound up to logarithmic factors.
no code implementations • ICML 2020 • Qi Cai, Zhuoran Yang, Chi Jin, Zhaoran Wang
While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL.
no code implementations • 3 Dec 2019 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu
We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.
2 code implementations • 11 Jul 2019 • Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael. I. Jordan
Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy.
no code implementations • ICML 2020 • Tianyi Lin, Chi Jin, Michael. I. Jordan
We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set.
no code implementations • 13 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan
More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial.
no code implementations • 11 Feb 2019 • Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael. I. Jordan
In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.
1 code implementation • ICML 2020 • Chi Jin, Praneeth Netrapalli, Michael. I. Jordan
Minimax optimization has found extensive applications in modern machine learning, in settings such as generative adversarial networks (GANs), adversarial training and multi-agent reinforcement learning.
BIG-bench Machine Learning
Multi-agent Reinforcement Learning
no code implementations • 20 Nov 2018 • Yi-An Ma, Yuansi Chen, Chi Jin, Nicolas Flammarion, Michael. I. Jordan
Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years.
no code implementations • NeurIPS 2018 • Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael. I. Jordan
We prove that, in an episodic MDP setting, Q-learning with UCB exploration achieves regret $\tilde{O}(\sqrt{H^3 SAT})$, where $S$ and $A$ are the numbers of states and actions, $H$ is the number of steps per episode, and $T$ is the total number of steps.
no code implementations • 4 Apr 2018 • Yuansi Chen, Chi Jin, Bin Yu
Applying existing stability upper bounds for the gradient methods in our trade-off framework, we obtain lower bounds matching the well-established convergence upper bounds up to constants for these algorithms and conjecture similar lower bounds for NAG and HB.
no code implementations • NeurIPS 2018 • Chi Jin, Lydia T. Liu, Rong Ge, Michael. I. Jordan
Our objective is to find the $\epsilon$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $\nu$---which exist only in $f$.
no code implementations • 28 Nov 2017 • Chi Jin, Praneeth Netrapalli, Michael. I. Jordan
Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting.
no code implementations • NeurIPS 2018 • Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, Michael. I. Jordan
This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak 2006].
no code implementations • NeurIPS 2017 • Simon S. Du, Chi Jin, Jason D. Lee, Michael. I. Jordan, Barnabas Poczos, Aarti Singh
Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape.
no code implementations • ICML 2017 • Rong Ge, Chi Jin, Yi Zheng
In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA.
no code implementations • ICML 2017 • Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael. I. Jordan
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i. e., it is almost "dimension-free").
no code implementations • NeurIPS 2016 • Chi Jin, Yuchen Zhang, Sivaraman Balakrishnan, Martin J. Wainwright, Michael Jordan
Our first main result shows that the population likelihood function has bad local maxima even in the special case of equally-weighted mixtures of well-separated and spherical Gaussians.
no code implementations • 26 May 2016 • Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford
We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $\Sigma$ -- i. e. computing a unit vector $x$ such that $x^T \Sigma x \ge (1-\epsilon)\lambda_1(\Sigma)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $\Sigma = A^TA$, we show how to compute an $\epsilon$ approximate top eigenvector in time $\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/\epsilon )$ and $\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/\epsilon )$.
no code implementations • NeurIPS 2016 • Chi Jin, Sham M. Kakade, Praneeth Netrapalli
While existing algorithms are efficient for the offline setting, they could be highly inefficient for the online setting.
no code implementations • 13 Apr 2016 • Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford
Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor.
no code implementations • 22 Feb 2016 • Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford
This work provides improved guarantees for streaming principle component analysis (PCA).
no code implementations • 29 Oct 2015 • Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford
Combining our algorithm with previous work to initialize $x_0$, we obtain a number of improved sample complexity and runtime results.
1 code implementation • 6 Mar 2015 • Rong Ge, Furong Huang, Chi Jin, Yang Yuan
To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.
no code implementations • 6 Jan 2014 • Chi Jin, Ziteng Wang, Junliang Huang, Yiqiao Zhong, Li-Wei Wang
We develop an $\epsilon$-differentially private mechanism for the class of $K$-smooth queries.
no code implementations • NeurIPS 2012 • Chi Jin, Li-Wei Wang
We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity.