Search Results for author: Xiyu Zhai

Found 9 papers, 0 papers with code

On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels

no code implementations27 Aug 2019 Tengyuan Liang, Alexander Rakhlin, Xiyu Zhai

We study the risk of minimum-norm interpolants of data in Reproducing Kernel Hilbert Spaces.

Near Optimal Stratified Sampling

no code implementations26 Jun 2019 Tiancheng Yu, Xiyu Zhai, Suvrit Sra

The performance of a machine learning system is usually evaluated by using i. i. d.\ observations with true labels.

Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon

no code implementations28 Dec 2018 Alexander Rakhlin, Xiyu Zhai

We show that minimum-norm interpolation in the Reproducing Kernel Hilbert Space corresponding to the Laplace kernel is not consistent if input dimension is constant.

Vocal Bursts Intensity Prediction

How Many Samples are Needed to Estimate a Convolutional Neural Network?

no code implementations NeurIPS 2018 Simon S. Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan R. Salakhutdinov, Aarti Singh

We show that for an $m$-dimensional convolutional filter with linear activation acting on a $d$-dimensional input, the sample complexity of achieving population prediction error of $\epsilon$ is $\widetilde{O(m/\epsilon^2)$, whereas the sample-complexity for its FNN counterpart is lower bounded by $\Omega(d/\epsilon^2)$ samples.

LEMMA

Gradient Descent Finds Global Minima of Deep Neural Networks

no code implementations9 Nov 2018 Simon S. Du, Jason D. Lee, Haochuan Li, Li-Wei Wang, Xiyu Zhai

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex.

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

no code implementations ICLR 2019 Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth.

How Many Samples are Needed to Estimate a Convolutional or Recurrent Neural Network?

no code implementations NeurIPS 2018 Simon S. Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan Salakhutdinov, Aarti Singh

It is widely believed that the practical success of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) owes to the fact that CNNs and RNNs use a more compact parametric representation than their Fully-Connected Neural Network (FNN) counterparts, and consequently require fewer training examples to accurately estimate their parameters.

LEMMA

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

no code implementations19 Jul 2017 Wenlong Mou, Li-Wei Wang, Xiyu Zhai, Kai Zheng

This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.

Generalization Bounds Learning Theory +1

Cannot find the paper you are looking for? You can Submit a new open access paper.