no code implementations • 1 Sep 2024 • Natalia Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games, a problem marked by the challenge of sparse feedback signals.
no code implementations • 5 Jul 2024 • Divyansh Pareek, Simon S. Du, Sewoong Oh
Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model.
no code implementations • 29 Jun 2024 • Weihang Xu, Maryam Fazel, Simon S. Du
We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with $n>1$ components learns from data that are generated by a single ground truth Gaussian distribution.
1 code implementation • 27 May 2024 • Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon S. Du, Huazhe Xu
Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability.
no code implementations • 15 Mar 2024 • Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon S. Du
A recent line of works showed regret bounds in reinforcement learning (RL) can be (nearly) independent of planning horizon, a. k. a.~the horizon-free bounds.
no code implementations • 10 Mar 2024 • Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta
This work proposes a novel class of models, i. e., generalized occupancy models (GOMs), that learn a distribution of successor features from a stationary dataset, along with a policy that acts to realize different successor features.
1 code implementation • 20 Feb 2024 • Runlong Zhou, Simon S. Du, Beibin Li
We propose Reflect-RL, a two-player system to fine-tune an LM using SFT and online RL, where a frozen reflection model (player) assists the policy model (player).
no code implementations • 12 Feb 2024 • Qiwen Cui, Maryam Fazel, Simon S. Du
We study how to learn the optimal tax design to maximize the efficiency in nonatomic congestion games.
no code implementations • 11 Feb 2024 • Yan Dai, Qiwen Cui, Simon S. Du
Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL).
no code implementations • 12 Jan 2024 • Gantavya Bhatt, Yifang Chen, Arnav M. Das, Jifan Zhang, Sang T. Truong, Stephen Mussmann, Yinglun Zhu, Jeffrey Bilmes, Simon S. Du, Kevin Jamieson, Jordan T. Ash, Robert D. Nowak
To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design.
no code implementations • 8 Dec 2023 • Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du, Jason D. Lee
Focusing on a hypothesis class of Vapnik-Chervonenkis (VC) dimension d, we propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon^2 (modulo some logarithmic factor), matching the best-known lower bound.
1 code implementation • 30 Nov 2023 • Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon S. Du, Jason D. Lee, Wei Hu
Recent work by Power et al. (2022) highlighted a surprising "grokking" phenomenon in learning arithmetic tasks: a neural net first "memorizes" the training set, resulting in perfect training accuracy but near-random test accuracy, and after training for sufficiently longer, it suddenly transitions to perfect test accuracy.
1 code implementation • 31 Oct 2023 • Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets.
no code implementations • 3 Oct 2023 • Nuoya Xiong, Lijun Ding, Simon S. Du
This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence.
no code implementations • 25 Jul 2023 • Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du
While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a ``large-sample'' regime, imposing enormous burn-in cost in order for their algorithms to operate optimally.
no code implementations • 12 Jun 2023 • Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges.
no code implementations • 5 Jun 2023 • Yiping Wang, Yifang Chen, Kevin Jamieson, Simon S. Du
In addition to our sample complexity results, we also characterize the potential of our $\nu^1$-based strategy in sample-cost-sensitive settings.
no code implementations • 20 Feb 2023 • Weihang Xu, Simon S. Du
This is the first global convergence result for this problem beyond the exact-parameterization setting ($n=1$) in which the gradient descent enjoys an $\exp(-\Omega(T))$ rate.
no code implementations • 7 Feb 2023 • Qiwen Cui, Kaiqing Zhang, Simon S. Du
In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents.
no code implementations • NeurIPS 2023 • Yunchang Yang, Han Zhong, Tianhao Wu, Bin Liu, LiWei Wang, Simon S. Du
We study stochastic delayed feedback in general multi-agent sequential decision making, which includes bandits, single-agent Markov decision processes (MDPs), and Markov games (MGs).
no code implementations • 31 Jan 2023 • Runlong Zhou, Zihan Zhang, Simon S. Du
We further initiate the study on model-free algorithms with variance-dependent regret bounds by designing a reference-function-based algorithm with a novel capped-doubling reference update schedule.
no code implementations • 27 Jan 2023 • Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, Jason D. Lee
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models.
no code implementations • 24 Oct 2022 • Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
Starting from the facility-level (a. k. a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE.
no code implementations • 20 Oct 2022 • Runlong Zhou, Ruosong Wang, Simon S. Du
We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs.
no code implementations • 19 Oct 2022 • Haotian Ye, Xiaoyu Chen, LiWei Wang, Simon S. Du
Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment.
no code implementations • 4 Oct 2022 • Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
no code implementations • 3 Oct 2022 • Shicong Cen, Yuejie Chi, Simon S. Du, Lin Xiao
Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications.
no code implementations • 7 Sep 2022 • Yulai Zhao, Jianshu Chen, Simon S. Du
Here, $n$ is the number of pre-training data and $m$ is the number of data in the downstream task, and typically $n \gg m$.
1 code implementation • 30 Jun 2022 • Tongzhou Wang, Simon S. Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian
The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence.
no code implementations • 17 Jun 2022 • Simon S. Du, Gauthier Gidel, Michael I. Jordan, Chris Junchi Li
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $\min_{\mathbf{x}}\max_{\mathbf{y}}~F(\mathbf{x}) + H(\mathbf{x},\mathbf{y}) - G(\mathbf{y})$, where one has access to stochastic first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
no code implementations • 4 Jun 2022 • Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.
no code implementations • 1 Jun 2022 • Qiwen Cui, Simon S. Du
Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales $\sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th player and $m$ is the number of players.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 1 Jun 2022 • Xinqi Wang, Qiwen Cui, Simon S. Du
This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning.
no code implementations • 31 May 2022 • Rui Lu, Andrew Zhao, Simon S. Du, Gao Huang
While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited.
no code implementations • 26 May 2022 • Yan Dai, Ruosong Wang, Simon S. Du
On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$.
no code implementations • 29 Mar 2022 • Jiaqi Yang, Qi Lei, Jason D. Lee, Simon S. Du
We give novel algorithms for multi-task and lifelong linear bandits with shared representation.
no code implementations • 24 Mar 2022 • Zihan Zhang, Xiangyang Ji, Simon S. Du
This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes (MDP) that enjoys a regret bound \emph{independent on the planning horizon}.
1 code implementation • 11 Feb 2022 • Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du
Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift, a critical quantity that governs the convergence rate in our theorem.
no code implementations • 4 Feb 2022 • Meixin Zhu, Simon S. Du, Xuesong Wang, Hao, Yang, Ziyuan Pu, Yinhai Wang
Through cross-attention between encoder and decoder, the decoder learns to build a connection between historical driving and future LV speed, based on which a prediction of future FV speed can be obtained.
no code implementations • 2 Feb 2022 • Yifang Chen, Simon S. Du, Kevin Jamieson
To leverage the power of big data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications.
no code implementations • 26 Jan 2022 • Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson
We first develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP with sample complexity scaling as $\widetilde{\mathcal{O}}(d^2 H^5/\epsilon^2)$.
no code implementations • 10 Jan 2022 • Qiwen Cui, Simon S. Du
We study what dataset assumption permits solving offline two-player zero-sum Markov games.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 21 Dec 2021 • Tianhao Wu, Yunchang Yang, Han Zhong, LiWei Wang, Simon S. Du, Jiantao Jiao
Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms.
no code implementations • 7 Dec 2021 • Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making.
2 code implementations • 11 Oct 2021 • Xiang Wang, Xinlei Chen, Simon S. Du, Yuandong Tian
Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image.
no code implementations • 1 Jul 2021 • Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S. Du
As one of the most popular methods in the field of reinforcement learning, Q-learning has received increasing attention.
no code implementations • NeurIPS 2021 • Tian Ye, Simon S. Du
We study the asymmetric low-rank factorization problem: \[\min_{\mathbf{U} \in \mathbb{R}^{m \times d}, \mathbf{V} \in \mathbb{R}^{n \times d}} \frac{1}{2}\|\mathbf{U}\mathbf{V}^\top -\mathbf{\Sigma}\|_F^2\] where $\mathbf{\Sigma}$ is a given matrix of size $m \times n$ and rank $d$.
no code implementations • ICLR 2022 • Yunchang Yang, Tianhao Wu, Han Zhong, Evrard Garcelon, Matteo Pirotta, Alessandro Lazaric, LiWei Wang, Simon S. Du
We also obtain a new upper bound for conservative low-rank MDP.
no code implementations • NeurIPS 2021 • Yifang Chen, Simon S. Du, Kevin Jamieson
We conduct theoretical studies on streaming-based active learning for binary classification under unknown adversarial label corruptions.
no code implementations • 15 Jun 2021 • Rui Lu, Gao Huang, Simon S. Du
We first discover a \emph{Least-Activated-Feature-Abundance} (LAFA) criterion, denoted as $\kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $\tilde{O}(H^2\sqrt{\frac{\mathcal{C}(\Phi)^2 \kappa d}{NT}+\frac{\kappa d}{n}})$ sub-optimal.
no code implementations • ICLR 2022 • Zhili Feng, Shaobo Han, Simon S. Du
This paper studies zero-shot domain adaptation where each domain is indexed on a multi-dimensional array, and we only have data from a small subset of domains.
no code implementations • NeurIPS 2021 • Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.
no code implementations • NeurIPS 2021 • Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi
To the best of our knowledge, these are the \emph{first} set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points.
no code implementations • 19 Mar 2021 • Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang
The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.
no code implementations • 19 Feb 2021 • Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du
To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.
no code implementations • 17 Feb 2021 • Yulai Zhao, Yuandong Tian, Jason D. Lee, Simon S. Du
Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces.
no code implementations • 13 Feb 2021 • Yifang Chen, Simon S. Du, Kevin Jamieson
We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system.
no code implementations • 9 Feb 2021 • Haike Xu, Tengyu Ma, Simon S. Du
We further show that for general MDPs, AMB suffers an additional $\frac{|Z_{mul}|}{\Delta_{min}}$ regret, where $Z_{mul}$ is the set of state-action pairs $(s, a)$'s satisfying $a$ is a non-unique optimal action for $s$.
no code implementations • NeurIPS 2021 • Zihan Zhang, Jiaqi Yang, Xiangyang Ji, Simon S. Du
With the new confidence sets, we obtain the follow regret bounds: For linear bandits, we obtain an $\tilde{O}(poly(d)\sqrt{1 + \sum_{k=1}^{K}\sigma_k^2})$ data-dependent regret bound, where $d$ is the feature dimension, $K$ is the number of rounds, and $\sigma_k^2$ is the \emph{unknown} variance of the reward at the $k$-th round.
no code implementations • 2 Jan 2021 • Minbo Gao, Tianle Xie, Simon S. Du, Lin F. Yang
This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space.
no code implementations • NeurIPS 2020 • Ruosong Wang, Peilin Zhong, Simon S. Du, Russ R. Salakhutdinov, Lin Yang
Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i. e., maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon.
no code implementations • NeurIPS 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang
The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$.
no code implementations • NeurIPS 2020 • Ruosong Wang, Simon S. Du, Lin Yang, Sham Kakade
In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds.
no code implementations • ICLR 2021 • Jiaqi Yang, Wei Hu, Jason D. Lee, Simon S. Du
For the finite-action setting, we present a new algorithm which achieves $\widetilde{O}(T\sqrt{kN} + \sqrt{dkNT})$ regret, where $N$ is the number of rounds we play for each bandit.
no code implementations • 12 Oct 2020 • Zihan Zhang, Simon S. Du, Xiangyang Ji
In the planning phase, the agent needs to return a near-optimal policy for arbitrary reward functions.
no code implementations • 28 Sep 2020 • Zihan Zhang, Xiangyang Ji, Simon S. Du
Episodic reinforcement learning generalizes contextual bandits and is often perceived to be more difficult due to long planning horizon and unknown state-dependent transitions.
3 code implementations • ICLR 2021 • Keyulu Xu, Mozhi Zhang, Jingling Li, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka
Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e. g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features.
no code implementations • NeurIPS 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov
The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions.
no code implementations • 16 Jun 2020 • Kunhe Yang, Lin F. Yang, Simon S. Du
This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap in the optimal $Q$-function.
no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu
Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.
no code implementations • 1 May 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade
Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class.
1 code implementation • NeurIPS 2020 • Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin F. Yang
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration, bellemare2016unifying], we investigate when this paradigm is provably efficient.
no code implementations • ICML 2020 • Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi
We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.
no code implementations • ICLR 2021 • Simon S. Du, Wei Hu, Sham M. Kakade, Jason D. Lee, Qi Lei
First, we study the setting where this common representation is low-dimensional and provide a fast rate of $O\left(\frac{\mathcal{C}\left(\Phi\right)}{n_1T} + \frac{k}{n_2}\right)$; here, $\Phi$ is the representation function class, $\mathcal{C}\left(\Phi\right)$ is its complexity measure, and $k$ is the dimension of the representation.
no code implementations • 17 Feb 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang
2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.
no code implementations • NeurIPS 2020 • Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora
Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation.
no code implementations • ICLR 2021 • Yining Wang, Ruosong Wang, Simon S. Du, Akshay Krishnamurthy
We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation.
no code implementations • NeurIPS 2019 • Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang
Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating Q-functions with linear functions, it is still an open problem how to design a provably efficient algorithm that learns a near-optimal policy.
no code implementations • 3 Nov 2019 • Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora
An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.
no code implementations • 30 Oct 2019 • Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.
no code implementations • ICLR 2020 • Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang
With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning.
4 code implementations • ICLR 2020 • Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu
On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.
1 code implementation • 28 Sep 2019 • Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum
A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty.
no code implementations • 25 Sep 2019 • Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
Modern complex sequential decision-making problem often both low-level policy and high-level planning.
no code implementations • NeurIPS 2019 • Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao
We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball.
no code implementations • 14 Jun 2019 • Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang
Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating $Q$-functions with linear functions, it is still an open problem on how to design a provably efficient algorithm that learns a near-optimal policy.
1 code implementation • NeurIPS 2019 • Simon S. Du, Kangcheng Hou, Barnabás Póczos, Ruslan Salakhutdinov, Ruosong Wang, Keyulu Xu
While graph kernels (GKs) are easy to train and enjoy provable theoretical guarantees, their practical performances are limited by their expressive power, as the kernel function often depends on hand-crafted combinatorial features of graphs.
2 code implementations • ICLR 2020 • Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka
Neural networks have succeeded in many reasoning tasks.
no code implementations • 30 Apr 2019 • Xi Chen, Simon S. Du, Xin T. Tong
In this paper, using intuitions from stochastic differential equations, we provide a direct analysis for the hitting times of SGLD to the first and second order stationary points.
2 code implementations • NeurIPS 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang
An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.
no code implementations • 19 Feb 2019 • Xiaoxia Wu, Simon S. Du, Rachel Ward
Adaptive gradient methods like AdaGrad are widely used in optimizing neural networks.
no code implementations • NeurIPS 2019 • Bin Shi, Simon S. Du, Weijie J. Su, Michael. I. Jordan
We study first-order optimization methods obtained by discretizing ordinary differential equations (ODEs) corresponding to Nesterov's accelerated gradient methods (NAGs) and Polyak's heavy-ball method.
1 code implementation • 25 Jan 2019 • Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford
We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.
no code implementations • 24 Jan 2019 • Simon S. Du, Wei Hu
We prove that for an $L$-layer fully-connected linear neural network, if the width of every hidden layer is $\tilde\Omega (L \cdot r \cdot d_{\mathrm{out}} \cdot \kappa^3 )$, where $r$ and $\kappa$ are the rank and the condition number of the input data, and $d_{\mathrm{out}}$ is the output dimension, then gradient descent with Gaussian random initialization converges to a global minimum at a linear rate.
no code implementations • 24 Jan 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang
This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].
no code implementations • NeurIPS 2018 • Simon S. Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan R. Salakhutdinov, Aarti Singh
We show that for an $m$-dimensional convolutional filter with linear activation acting on a $d$-dimensional input, the sample complexity of achieving population prediction error of $\epsilon$ is $\widetilde{O(m/\epsilon^2)$, whereas the sample-complexity for its FNN counterpart is lower bounded by $\Omega(d/\epsilon^2)$ samples.
no code implementations • 9 Nov 2018 • Simon S. Du, Jason D. Lee, Haochuan Li, Li-Wei Wang, Xiyu Zhai
Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex.
no code implementations • 21 Oct 2018 • Bin Shi, Simon S. Du, Michael. I. Jordan, Weijie J. Su
We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods.
no code implementations • ICLR 2019 • Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh
One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth.
no code implementations • NeurIPS 2018 • Simon S. Du, Wei Hu, Jason D. Lee
Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization.
no code implementations • 26 May 2018 • Simon S. Du, Yining Wang, Sivaraman Balakrishnan, Pradeep Ravikumar, Aarti Singh
We first show that a simple local binning median step can effectively remove the adversary noise and this median estimator is minimax optimal up to absolute constants over the H\"{o}lder function class with smoothness parameters smaller than or equal to 1.
no code implementations • NeurIPS 2018 • Simon S. Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan Salakhutdinov, Aarti Singh
It is widely believed that the practical success of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) owes to the fact that CNNs and RNNs use a more compact parametric representation than their Fully-Connected Neural Network (FNN) counterparts, and consequently require fewer training examples to accurately estimate their parameters.
no code implementations • ICLR 2019 • Simon S. Du, Surbhi Goel
We propose a new algorithm to learn a one-hidden-layer convolutional neural network where both the convolutional weights and the outputs weights are parameters to be learned.
1 code implementation • ICML 2018 • Simon S. Du, Jason D. Lee
We provide new theoretical insights on why over-parametrization is effective in learning neural networks.
1 code implementation • ICML 2018 • Xiao Zhang, Simon S. Du, Quanquan Gu
We revisit the inductive matrix completion problem that aims to recover a rank-$r$ matrix with ambient dimension $d$ given $n$ features as the side prior information.
no code implementations • 26 Feb 2018 • Yining Wang, Yi Wu, Simon S. Du
Local polynomial regression (Fan and Gijbels 1996) is an important class of methods for nonparametric density estimation and regression problems.
no code implementations • 5 Feb 2018 • Simon S. Du, Wei Hu
We consider the convex-concave saddle point problem $\min_{x}\max_{y} f(x)+y^\top A x-g(y)$ where $f$ is smooth and convex and $g$ is smooth and strongly convex.
no code implementations • ICML 2018 • Simon S. Du, Jason D. Lee, Yuandong Tian, Barnabas Poczos, Aarti Singh
We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i. e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_j\sigma(\mathbf{w}^T\mathbf{Z}_j)$, in which both the convolutional weights $\mathbf{w}$ and the output weights $\mathbf{a}$ are parameters to be learned.
no code implementations • ICLR 2018 • Simon S. Du, Jason D. Lee, Yuandong Tian
We show that (stochastic) gradient descent with random initialization can learn the convolutional filter in polynomial time and the convergence rate depends on the smoothness of the input distribution and the closeness of patches.
no code implementations • NeurIPS 2017 • Simon S. Du, Chi Jin, Jason D. Lee, Michael. I. Jordan, Barnabas Poczos, Aarti Singh
Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape.
no code implementations • ICML 2017 • Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou
Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy.
no code implementations • 24 Feb 2017 • Simon S. Du, Sivaraman Balakrishnan, Aarti Singh
Many conventional statistical procedures are extremely sensitive to seemingly minor deviations from modeling assumptions.
no code implementations • NeurIPS 2017 • Simon S. Du, Yining Wang, Aarti Singh
This observation leads to many interesting results on general high-rank matrix estimation problems, which we briefly summarize below ($A$ is an $n\times n$ high-rank PSD matrix and $A_k$ is the best rank-$k$ approximation of $A$): (1) High-rank matrix completion: By observing $\Omega(\frac{n\max\{\epsilon^{-4}, k^2\}\mu_0^2\|A\|_F^2\log n}{\sigma_{k+1}(A)^2})$ elements of $A$ where $\sigma_{k+1}\left(A\right)$ is the $\left(k+1\right)$-th singular value of $A$ and $\mu_0$ is the incoherence, the truncated SVD on a zero-filled matrix satisfies $\|\widehat{A}_k-A\|_F \leq (1+O(\epsilon))\|A-A_k\|_F$ with high probability.
1 code implementation • NeurIPS 2016 • Shashank Singh, Simon S. Du, Barnabás Póczos
Sobolev quantities (norms, inner products, and distances) of probability density functions are important in the theory of nonparametric statistics, but have rarely been used in practice, partly due to a lack of practical estimators.
no code implementations • 23 Feb 2016 • Maria Florina Balcan, Simon S. Du, Yining Wang, Adams Wei Yu
We consider the noisy power method algorithm, which has wide applications in machine learning and statistics, especially those related to principal component analysis (PCA) under resource (communication, memory or privacy) constraints.