Search Results for author: Simon S. Du

Found 115 papers, 18 papers with code

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

no code implementations1 Sep 2024 Natalia Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du

We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games, a problem marked by the challenge of sparse feedback signals.

Imitation Learning Multi-agent Reinforcement Learning

Understanding the Gains from Repeated Self-Distillation

no code implementations5 Jul 2024 Divyansh Pareek, Simon S. Du, Sewoong Oh

Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model.

Knowledge Distillation regression

Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models

no code implementations29 Jun 2024 Weihang Xu, Maryam Fazel, Simon S. Du

We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with $n>1$ components learns from data that are generated by a single ground truth Gaussian distribution.

Rethinking Transformers in Solving POMDPs

1 code implementation27 May 2024 Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon S. Du, Huazhe Xu

Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability.

Decision Making Reinforcement Learning (RL)

Horizon-Free Regret for Linear Markov Decision Processes

no code implementations15 Mar 2024 Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon S. Du

A recent line of works showed regret bounds in reinforcement learning (RL) can be (nearly) independent of planning horizon, a. k. a.~the horizon-free bounds.

LEMMA Reinforcement Learning (RL)

Transferable Reinforcement Learning via Generalized Occupancy Models

no code implementations10 Mar 2024 Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta

This work proposes a novel class of models, i. e., generalized occupancy models (GOMs), that learn a distribution of successor features from a stationary dataset, along with a policy that acts to realize different successor features.

reinforcement-learning Reinforcement Learning (RL)

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

1 code implementation20 Feb 2024 Runlong Zhou, Simon S. Du, Beibin Li

We propose Reflect-RL, a two-player system to fine-tune an LM using SFT and online RL, where a frozen reflection model (player) assists the policy model (player).

Decision Making Reinforcement Learning (RL)

Learning Optimal Tax Design in Nonatomic Congestion Games

no code implementations12 Feb 2024 Qiwen Cui, Maryam Fazel, Simon S. Du

We study how to learn the optimal tax design to maximize the efficiency in nonatomic congestion games.

Optimal Multi-Distribution Learning

no code implementations8 Dec 2023 Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du, Jason D. Lee

Focusing on a hypothesis class of Vapnik-Chervonenkis (VC) dimension d, we propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon^2 (modulo some logarithmic factor), matching the best-known lower bound.

Fairness

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

1 code implementation30 Nov 2023 Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon S. Du, Jason D. Lee, Wei Hu

Recent work by Power et al. (2022) highlighted a surprising "grokking" phenomenon in learning arithmetic tasks: a neural net first "memorizes" the training set, resulting in perfect training accuracy but near-random test accuracy, and after training for sufficiently longer, it suddenly transitions to perfect test accuracy.

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

no code implementations3 Oct 2023 Nuoya Xiong, Lijun Ding, Simon S. Du

This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence.

Settling the Sample Complexity of Online Reinforcement Learning

no code implementations25 Jul 2023 Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du

While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a ``large-sample'' regime, imposing enormous burn-in cost in order for their algorithms to operate optimally.

reinforcement-learning Reinforcement Learning (RL)

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

no code implementations12 Jun 2023 Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges.

Multi-agent Reinforcement Learning reinforcement-learning

Improved Active Multi-Task Representation Learning via Lasso

no code implementations5 Jun 2023 Yiping Wang, Yifang Chen, Kevin Jamieson, Simon S. Du

In addition to our sample complexity results, we also characterize the potential of our $\nu^1$-based strategy in sample-cost-sensitive settings.

Representation Learning

Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron

no code implementations20 Feb 2023 Weihang Xu, Simon S. Du

This is the first global convergence result for this problem beyond the exact-parameterization setting ($n=1$) in which the gradient descent enjoys an $\exp(-\Omega(T))$ rate.

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

no code implementations7 Feb 2023 Qiwen Cui, Kaiqing Zhang, Simon S. Du

In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents.

Multi-agent Reinforcement Learning

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

no code implementations NeurIPS 2023 Yunchang Yang, Han Zhong, Tianhao Wu, Bin Liu, LiWei Wang, Simon S. Du

We study stochastic delayed feedback in general multi-agent sequential decision making, which includes bandits, single-agent Markov decision processes (MDPs), and Markov games (MGs).

Decision Making

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

no code implementations31 Jan 2023 Runlong Zhou, Zihan Zhang, Simon S. Du

We further initiate the study on model-free algorithms with variance-dependent regret bounds by designing a reference-function-based algorithm with a novel capped-doubling reference update schedule.

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

no code implementations27 Jan 2023 Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, Jason D. Lee

It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models.

Incremental Learning

Offline congestion games: How feedback type affects data coverage requirement

no code implementations24 Oct 2022 Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

Starting from the facility-level (a. k. a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE.

Vocal Bursts Type Prediction

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes

no code implementations20 Oct 2022 Runlong Zhou, Ruosong Wang, Simon S. Du

We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs.

reinforcement-learning Reinforcement Learning (RL)

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness

no code implementations19 Oct 2022 Haotian Ye, Xiaoyu Chen, LiWei Wang, Simon S. Du

Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment.

Reinforcement Learning (RL)

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

no code implementations4 Oct 2022 Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.

Policy Gradient Methods

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

no code implementations3 Oct 2022 Shicong Cen, Yuejie Chi, Simon S. Du, Lin Xiao

Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications.

Multi-agent Reinforcement Learning

Blessing of Class Diversity in Pre-training

no code implementations7 Sep 2022 Yulai Zhao, Jianshu Chen, Simon S. Du

Here, $n$ is the number of pre-training data and $m$ is the number of data in the downstream task, and typically $n \gg m$.

Diversity Language Modelling +1

Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization

no code implementations17 Jun 2022 Simon S. Du, Gauthier Gidel, Michael I. Jordan, Chris Junchi Li

We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $\min_{\mathbf{x}}\max_{\mathbf{y}}~F(\mathbf{x}) + H(\mathbf{x},\mathbf{y}) - G(\mathbf{y})$, where one has access to stochastic first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.

Learning in Congestion Games with Bandit Feedback

no code implementations4 Jun 2022 Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.

Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

no code implementations1 Jun 2022 Qiwen Cui, Simon S. Du

Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales $\sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th player and $m$ is the number of players.

Multi-agent Reinforcement Learning reinforcement-learning +1

On Gap-dependent Bounds for Offline Reinforcement Learning

no code implementations1 Jun 2022 Xinqi Wang, Qiwen Cui, Simon S. Du

This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Provable General Function Class Representation Learning in Multitask Bandits and MDPs

no code implementations31 May 2022 Rui Lu, Andrew Zhao, Simon S. Du, Gao Huang

While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited.

Multi-Armed Bandits Reinforcement Learning (RL) +1

Variance-Aware Sparse Linear Bandits

no code implementations26 May 2022 Yan Dai, Ruosong Wang, Simon S. Du

On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$.

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

no code implementations29 Mar 2022 Jiaqi Yang, Qi Lei, Jason D. Lee, Simon S. Du

We give novel algorithms for multi-task and lifelong linear bandits with shared representation.

Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies

no code implementations24 Mar 2022 Zihan Zhang, Xiangyang Ji, Simon S. Du

This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes (MDP) that enjoys a regret bound \emph{independent on the planning horizon}.

reinforcement-learning Reinforcement Learning (RL)

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

1 code implementation11 Feb 2022 Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du

Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift, a critical quantity that governs the convergence rate in our theorem.

Combinatorial Optimization Reinforcement Learning (RL)

TransFollower: Long-Sequence Car-Following Trajectory Prediction through Transformer

no code implementations4 Feb 2022 Meixin Zhu, Simon S. Du, Xuesong Wang, Hao, Yang, Ziyuan Pu, Yinhai Wang

Through cross-attention between encoder and decoder, the decoder learns to build a connection between historical driving and future LV speed, based on which a prediction of future FV speed can be obtained.

Decoder Trajectory Prediction

Active Multi-Task Representation Learning

no code implementations2 Feb 2022 Yifang Chen, Simon S. Du, Kevin Jamieson

To leverage the power of big data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications.

Active Learning Multi-Task Learning +1

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

no code implementations26 Jan 2022 Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

We first develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP with sample complexity scaling as $\widetilde{\mathcal{O}}(d^2 H^5/\epsilon^2)$.

Reinforcement Learning (RL)

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

no code implementations21 Dec 2021 Tianhao Wu, Yunchang Yang, Han Zhong, LiWei Wang, Simon S. Du, Jiantao Jiao

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms.

4k Reinforcement Learning (RL)

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

no code implementations7 Dec 2021 Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson

Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making.

Decision Making reinforcement-learning +1

Towards Demystifying Representation Learning with Non-contrastive Self-supervision

2 code implementations11 Oct 2021 Xiang Wang, Xinlei Chen, Simon S. Du, Yuandong Tian

Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image.

Representation Learning Self-Supervised Learning

Gap-Dependent Bounds for Two-Player Markov Games

no code implementations1 Jul 2021 Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S. Du

As one of the most popular methods in the field of reinforcement learning, Q-learning has received increasing attention.

Q-Learning Vocal Bursts Valence Prediction

Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization

no code implementations NeurIPS 2021 Tian Ye, Simon S. Du

We study the asymmetric low-rank factorization problem: \[\min_{\mathbf{U} \in \mathbb{R}^{m \times d}, \mathbf{V} \in \mathbb{R}^{n \times d}} \frac{1}{2}\|\mathbf{U}\mathbf{V}^\top -\mathbf{\Sigma}\|_F^2\] where $\mathbf{\Sigma}$ is a given matrix of size $m \times n$ and rank $d$.

Matrix Completion

Corruption Robust Active Learning

no code implementations NeurIPS 2021 Yifang Chen, Simon S. Du, Kevin Jamieson

We conduct theoretical studies on streaming-based active learning for binary classification under unknown adversarial label corruptions.

Active Learning Binary Classification

On the Power of Multitask Representation Learning in Linear MDP

no code implementations15 Jun 2021 Rui Lu, Gao Huang, Simon S. Du

We first discover a \emph{Least-Activated-Feature-Abundance} (LAFA) criterion, denoted as $\kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $\tilde{O}(H^2\sqrt{\frac{\mathcal{C}(\Phi)^2 \kappa d}{NT}+\frac{\kappa d}{n}})$ sub-optimal.

Reinforcement Learning (RL) Representation Learning

Provable Adaptation across Multiway Domains via Representation Learning

no code implementations ICLR 2022 Zhili Feng, Shaobo Han, Simon S. Du

This paper studies zero-shot domain adaptation where each domain is indexed on a multi-dimensional array, and we only have data from a small subset of domains.

Domain Adaptation Representation Learning

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

no code implementations NeurIPS 2021 Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.

Nearly Horizon-Free Offline Reinforcement Learning

no code implementations NeurIPS 2021 Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi

To the best of our knowledge, these are the \emph{first} set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points.

reinforcement-learning Reinforcement Learning (RL)

Bilinear Classes: A Structural Framework for Provable Generalization in RL

no code implementations19 Mar 2021 Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

no code implementations19 Feb 2021 Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du

To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

no code implementations17 Feb 2021 Yulai Zhao, Yuandong Tian, Jason D. Lee, Simon S. Du

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces.

Policy Gradient Methods Vocal Bursts Valence Prediction

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

no code implementations13 Feb 2021 Yifang Chen, Simon S. Du, Kevin Jamieson

We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system.

reinforcement-learning Reinforcement Learning (RL)

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

no code implementations9 Feb 2021 Haike Xu, Tengyu Ma, Simon S. Du

We further show that for general MDPs, AMB suffers an additional $\frac{|Z_{mul}|}{\Delta_{min}}$ regret, where $Z_{mul}$ is the set of state-action pairs $(s, a)$'s satisfying $a$ is a non-unique optimal action for $s$.

Multi-Armed Bandits

Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP

no code implementations NeurIPS 2021 Zihan Zhang, Jiaqi Yang, Xiangyang Ji, Simon S. Du

With the new confidence sets, we obtain the follow regret bounds: For linear bandits, we obtain an $\tilde{O}(poly(d)\sqrt{1 + \sum_{k=1}^{K}\sigma_k^2})$ data-dependent regret bound, where $d$ is the feature dimension, $K$ is the number of rounds, and $\sigma_k^2$ is the \emph{unknown} variance of the reward at the $k$-th round.

LEMMA

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

no code implementations2 Jan 2021 Minbo Gao, Tianle Xie, Simon S. Du, Lin F. Yang

This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space.

4k Recommendation Systems

Planning with General Objective Functions: Going Beyond Total Rewards

no code implementations NeurIPS 2020 Ruosong Wang, Peilin Zhong, Simon S. Du, Russ R. Salakhutdinov, Lin Yang

Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i. e., maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon.

Decision Making

Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

no code implementations NeurIPS 2020 Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$.

Q-Learning

Is Long Horizon RL More Difficult Than Short Horizon RL?

no code implementations NeurIPS 2020 Ruosong Wang, Simon S. Du, Lin Yang, Sham Kakade

In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds.

reinforcement-learning Reinforcement Learning (RL)

Impact of Representation Learning in Linear Bandits

no code implementations ICLR 2021 Jiaqi Yang, Wei Hu, Jason D. Lee, Simon S. Du

For the finite-action setting, we present a new algorithm which achieves $\widetilde{O}(T\sqrt{kN} + \sqrt{dkNT})$ regret, where $N$ is the number of rounds we play for each bandit.

Representation Learning

Nearly Minimax Optimal Reward-free Reinforcement Learning

no code implementations12 Oct 2020 Zihan Zhang, Simon S. Du, Xiangyang Ji

In the planning phase, the agent needs to return a near-optimal policy for arbitrary reward functions.

reinforcement-learning Reinforcement Learning (RL)

Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon

no code implementations28 Sep 2020 Zihan Zhang, Xiangyang Ji, Simon S. Du

Episodic reinforcement learning generalizes contextual bandits and is often perceived to be more difficult due to long planning horizon and unknown state-dependent transitions.

Decision Making Multi-Armed Bandits +2

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

3 code implementations ICLR 2021 Keyulu Xu, Mozhi Zhang, Jingling Li, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka

Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e. g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features.

On Reward-Free Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2020 Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov

The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions.

reinforcement-learning Reinforcement Learning (RL)

$Q$-learning with Logarithmic Regret

no code implementations16 Jun 2020 Kunhe Yang, Lin F. Yang, Simon S. Du

This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap in the optimal $Q$-function.

Q-Learning

When is Particle Filtering Efficient for Planning in Partially Observed Linear Dynamical Systems?

no code implementations10 Jun 2020 Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu

Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.

Decision Making

Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?

no code implementations1 May 2020 Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade

Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class.

reinforcement-learning Reinforcement Learning (RL)

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

1 code implementation NeurIPS 2020 Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin F. Yang

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration, bellemare2016unifying], we investigate when this paradigm is provably efficient.

Efficient Exploration reinforcement-learning +1

Provable Representation Learning for Imitation Learning via Bi-level Optimization

no code implementations ICML 2020 Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi

We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.

Imitation Learning Representation Learning

Few-Shot Learning via Learning the Representation, Provably

no code implementations ICLR 2021 Simon S. Du, Wei Hu, Sham M. Kakade, Jason D. Lee, Qi Lei

First, we study the setting where this common representation is low-dimensional and provide a fast rate of $O\left(\frac{\mathcal{C}\left(\Phi\right)}{n_1T} + \frac{k}{n_2}\right)$; here, $\Phi$ is the representation function class, $\mathcal{C}\left(\Phi\right)$ is its complexity measure, and $k$ is the dimension of the representation.

Few-Shot Learning Representation Learning

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

no code implementations17 Feb 2020 Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.

Q-Learning

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

no code implementations NeurIPS 2020 Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora

Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation.

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

no code implementations NeurIPS 2019 Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating Q-functions with linear functions, it is still an open problem how to design a provably efficient algorithm that learns a near-optimal policy.

Q-Learning reinforcement-learning +1

Enhanced Convolutional Neural Tangent Kernels

no code implementations3 Nov 2019 Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora

An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.

Data Augmentation regression

Continuous Control with Contexts, Provably

no code implementations30 Oct 2019 Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.

Continuous Control Decoder

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

no code implementations ICLR 2020 Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning.

Imitation Learning reinforcement-learning +1

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

4 code implementations ICLR 2020 Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.

Few-Shot Image Classification General Classification +3

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

1 code implementation28 Sep 2019 Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum

A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty.

Continuous Control

PROVABLY BENEFITS OF DEEP HIERARCHICAL RL

no code implementations25 Sep 2019 Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

Modern complex sequential decision-making problem often both low-level policy and high-level planning.

Decision Making Hierarchical Reinforcement Learning

Towards Understanding the Importance of Shortcut Connections in Residual Networks

no code implementations NeurIPS 2019 Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao

We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball.

Provably Efficient $Q$-learning with Function Approximation via Distribution Shift Error Checking Oracle

no code implementations14 Jun 2019 Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating $Q$-functions with linear functions, it is still an open problem on how to design a provably efficient algorithm that learns a near-optimal policy.

Q-Learning reinforcement-learning +1

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

1 code implementation NeurIPS 2019 Simon S. Du, Kangcheng Hou, Barnabás Póczos, Ruslan Salakhutdinov, Ruosong Wang, Keyulu Xu

While graph kernels (GKs) are easy to train and enjoy provable theoretical guarantees, their practical performances are limited by their expressive power, as the kernel function often depends on hand-crafted combinatorial features of graphs.

Graph Classification

On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics

no code implementations30 Apr 2019 Xi Chen, Simon S. Du, Xin T. Tong

In this paper, using intuitions from stochastic differential equations, we provide a direct analysis for the hitting times of SGLD to the first and second order stationary points.

Stochastic Optimization

On Exact Computation with an Infinitely Wide Neural Net

2 code implementations NeurIPS 2019 Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang

An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.

Gaussian Processes

Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network

no code implementations19 Feb 2019 Xiaoxia Wu, Simon S. Du, Rachel Ward

Adaptive gradient methods like AdaGrad are widely used in optimizing neural networks.

Acceleration via Symplectic Discretization of High-Resolution Differential Equations

no code implementations NeurIPS 2019 Bin Shi, Simon S. Du, Weijie J. Su, Michael. I. Jordan

We study first-order optimization methods obtained by discretizing ordinary differential equations (ODEs) corresponding to Nesterov's accelerated gradient methods (NAGs) and Polyak's heavy-ball method.

Vocal Bursts Intensity Prediction

Provably efficient RL with Rich Observations via Latent State Decoding

1 code implementation25 Jan 2019 Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.

Clustering Q-Learning +1

Width Provably Matters in Optimization for Deep Linear Neural Networks

no code implementations24 Jan 2019 Simon S. Du, Wei Hu

We prove that for an $L$-layer fully-connected linear neural network, if the width of every hidden layer is $\tilde\Omega (L \cdot r \cdot d_{\mathrm{out}} \cdot \kappa^3 )$, where $r$ and $\kappa$ are the rank and the condition number of the input data, and $d_{\mathrm{out}}$ is the output dimension, then gradient descent with Gaussian random initialization converges to a global minimum at a linear rate.

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

no code implementations24 Jan 2019 Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].

How Many Samples are Needed to Estimate a Convolutional Neural Network?

no code implementations NeurIPS 2018 Simon S. Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan R. Salakhutdinov, Aarti Singh

We show that for an $m$-dimensional convolutional filter with linear activation acting on a $d$-dimensional input, the sample complexity of achieving population prediction error of $\epsilon$ is $\widetilde{O(m/\epsilon^2)$, whereas the sample-complexity for its FNN counterpart is lower bounded by $\Omega(d/\epsilon^2)$ samples.

LEMMA

Gradient Descent Finds Global Minima of Deep Neural Networks

no code implementations9 Nov 2018 Simon S. Du, Jason D. Lee, Haochuan Li, Li-Wei Wang, Xiyu Zhai

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex.

Understanding the Acceleration Phenomenon via High-Resolution Differential Equations

no code implementations21 Oct 2018 Bin Shi, Simon S. Du, Michael. I. Jordan, Weijie J. Su

We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods.

Vocal Bursts Intensity Prediction

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

no code implementations ICLR 2019 Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth.

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

no code implementations NeurIPS 2018 Simon S. Du, Wei Hu, Jason D. Lee

Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization.

Robust Nonparametric Regression under Huber's $ε$-contamination Model

no code implementations26 May 2018 Simon S. Du, Yining Wang, Sivaraman Balakrishnan, Pradeep Ravikumar, Aarti Singh

We first show that a simple local binning median step can effectively remove the adversary noise and this median estimator is minimax optimal up to absolute constants over the H\"{o}lder function class with smoothness parameters smaller than or equal to 1.

regression

How Many Samples are Needed to Estimate a Convolutional or Recurrent Neural Network?

no code implementations NeurIPS 2018 Simon S. Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan Salakhutdinov, Aarti Singh

It is widely believed that the practical success of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) owes to the fact that CNNs and RNNs use a more compact parametric representation than their Fully-Connected Neural Network (FNN) counterparts, and consequently require fewer training examples to accurately estimate their parameters.

LEMMA

Improved Learning of One-hidden-layer Convolutional Neural Networks with Overlaps

no code implementations ICLR 2019 Simon S. Du, Surbhi Goel

We propose a new algorithm to learn a one-hidden-layer convolutional neural network where both the convolutional weights and the outputs weights are parameters to be learned.

regression

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

1 code implementation ICML 2018 Simon S. Du, Jason D. Lee

We provide new theoretical insights on why over-parametrization is effective in learning neural networks.

Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow

1 code implementation ICML 2018 Xiao Zhang, Simon S. Du, Quanquan Gu

We revisit the inductive matrix completion problem that aims to recover a rank-$r$ matrix with ambient dimension $d$ given $n$ features as the side prior information.

Matrix Completion

Near-Linear Time Local Polynomial Nonparametric Estimation with Box Kernels

no code implementations26 Feb 2018 Yining Wang, Yi Wu, Simon S. Du

Local polynomial regression (Fan and Gijbels 1996) is an important class of methods for nonparametric density estimation and regression problems.

Density Estimation regression

Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity

no code implementations5 Feb 2018 Simon S. Du, Wei Hu

We consider the convex-concave saddle point problem $\min_{x}\max_{y} f(x)+y^\top A x-g(y)$ where $f$ is smooth and convex and $g$ is smooth and strongly convex.

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

no code implementations ICML 2018 Simon S. Du, Jason D. Lee, Yuandong Tian, Barnabas Poczos, Aarti Singh

We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i. e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_j\sigma(\mathbf{w}^T\mathbf{Z}_j)$, in which both the convolutional weights $\mathbf{w}$ and the output weights $\mathbf{a}$ are parameters to be learned.

When is a Convolutional Filter Easy To Learn?

no code implementations ICLR 2018 Simon S. Du, Jason D. Lee, Yuandong Tian

We show that (stochastic) gradient descent with random initialization can learn the convolutional filter in polynomial time and the convergence rate depends on the smoothness of the input distribution and the closeness of patches.

Gradient Descent Can Take Exponential Time to Escape Saddle Points

no code implementations NeurIPS 2017 Simon S. Du, Chi Jin, Jason D. Lee, Michael. I. Jordan, Barnabas Poczos, Aarti Singh

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape.

Stochastic Variance Reduction Methods for Policy Evaluation

no code implementations ICML 2017 Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy.

Reinforcement Learning (RL)

Computationally Efficient Robust Estimation of Sparse Functionals

no code implementations24 Feb 2017 Simon S. Du, Sivaraman Balakrishnan, Aarti Singh

Many conventional statistical procedures are extremely sensitive to seemingly minor deviations from modeling assumptions.

regression

On the Power of Truncated SVD for General High-rank Matrix Estimation Problems

no code implementations NeurIPS 2017 Simon S. Du, Yining Wang, Aarti Singh

This observation leads to many interesting results on general high-rank matrix estimation problems, which we briefly summarize below ($A$ is an $n\times n$ high-rank PSD matrix and $A_k$ is the best rank-$k$ approximation of $A$): (1) High-rank matrix completion: By observing $\Omega(\frac{n\max\{\epsilon^{-4}, k^2\}\mu_0^2\|A\|_F^2\log n}{\sigma_{k+1}(A)^2})$ elements of $A$ where $\sigma_{k+1}\left(A\right)$ is the $\left(k+1\right)$-th singular value of $A$ and $\mu_0$ is the incoherence, the truncated SVD on a zero-filled matrix satisfies $\|\widehat{A}_k-A\|_F \leq (1+O(\epsilon))\|A-A_k\|_F$ with high probability.

Matrix Completion

Efficient Nonparametric Smoothness Estimation

1 code implementation NeurIPS 2016 Shashank Singh, Simon S. Du, Barnabás Póczos

Sobolev quantities (norms, inner products, and distances) of probability density functions are important in the theory of nonparametric statistics, but have rarely been used in practice, partly due to a lack of practical estimators.

Two-sample testing

An Improved Gap-Dependency Analysis of the Noisy Power Method

no code implementations23 Feb 2016 Maria Florina Balcan, Simon S. Du, Yining Wang, Adams Wei Yu

We consider the noisy power method algorithm, which has wide applications in machine learning and statistics, especially those related to principal component analysis (PCA) under resource (communication, memory or privacy) constraints.

Cannot find the paper you are looking for? You can Submit a new open access paper.