Search Results for author: Mikhail Belkin

Found 43 papers, 7 papers with code

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

1 code implementation29 May 2021 Mikhail Belkin

In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges.

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

no code implementations28 Apr 2021 Yuan Cao, Quanquan Gu, Mikhail Belkin

Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice.

Classification General Classification

On the linearity of large non-linear models: when and why the tangent kernel is constant

no code implementations NeurIPS 2020 Chaoyue Liu, Libin Zhu, Mikhail Belkin

We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width.

Linear Convergence and Implicit Regularization of Generalized Mirror Descent with Time-Dependent Mirrors

no code implementations18 Sep 2020 Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

The following questions are fundamental to understanding the properties of over-parameterization in modern machine learning: (1) Under what conditions and at what rate does training converge to a global minimum?

Multiple Descent: Design Your Own Generalization Curve

no code implementations3 Aug 2020 Lin Chen, Yifei Min, Mikhail Belkin, Amin Karbasi

This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized.

Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks

no code implementations ICLR 2021 Like Hui, Mikhail Belkin

We explore several major neural architectures and a range of standard benchmark datasets for NLP, automatic speech recognition (ASR) and computer vision tasks to show that these architectures, with the same hyper-parameter settings as reported in the literature, perform comparably or better when trained with the square loss, even after equalizing computational resources.

automatic-speech-recognition General Classification +1

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

no code implementations29 Feb 2020 Chaoyue Liu, Libin Zhu, Mikhail Belkin

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks.

Overparameterized Neural Networks Implement Associative Memory

1 code implementation26 Sep 2019 Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience.

Downsampling leads to Image Memorization in Convolutional Autoencoders

no code implementations ICLR 2019 Adityanarayanan Radhakrishnan, Caroline Uhler, Mikhail Belkin

In this paper, we link memorization of images in deep convolutional autoencoders to downsampling through strided convolution.

Two models of double descent for weak features

no code implementations18 Mar 2019 Mikhail Belkin, Daniel Hsu, Ji Xu

The "double descent" risk curve was proposed to qualitatively describe the out-of-sample prediction accuracy of variably-parameterized machine learning models.

Reconciling modern machine learning practice and the bias-variance trade-off

2 code implementations28 Dec 2018 Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal

This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.

On exponential convergence of SGD in non-convex over-parametrized learning

no code implementations6 Nov 2018 Raef Bassily, Mikhail Belkin, Siyuan Ma

Large over-parametrized models learned via stochastic gradient descent (SGD) methods have become a key element in modern machine learning.

Accelerating SGD with momentum for over-parameterized learning

1 code implementation ICLR 2020 Chaoyue Liu, Mikhail Belkin

This is in contrast to the classical results in the deterministic scenario, where the same step size ensures accelerated convergence of the Nesterov's method over optimal gradient descent.

Memorization in Overparameterized Autoencoders

no code implementations16 Oct 2018 Adityanarayanan Radhakrishnan, Karren Yang, Mikhail Belkin, Caroline Uhler

The ability of deep neural networks to generalize well in the overparameterized regime has become a subject of significant research interest.

Does data interpolation contradict statistical optimality?

no code implementations25 Jun 2018 Mikhail Belkin, Alexander Rakhlin, Alexandre B. Tsybakov

We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.

Kernel machines that adapt to GPUs for effective large batch training

1 code implementation15 Jun 2018 Siyuan Ma, Mikhail Belkin

In this paper we develop the first analytical framework that extends linear scaling to match the parallel computing capacity of a resource.

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

no code implementations NeurIPS 2018 Mikhail Belkin, Daniel Hsu, Partha Mitra

Finally, this paper suggests a way to explain the phenomenon of adversarial examples, which are seemingly ubiquitous in modern machine learning, and also discusses some connections to kernel machines and random forests in the interpolated regime.

General Classification

Parametrized Accelerated Methods Free of Condition Number

no code implementations28 Feb 2018 Chaoyue Liu, Mikhail Belkin

Analyses of accelerated (momentum-based) gradient descent usually assume bounded condition number to obtain exponential convergence rates.

Fast Interactive Image Retrieval using large-scale unlabeled data

no code implementations12 Feb 2018 Akshay Mehra, Jihun Hamm, Mikhail Belkin

Active learning reduces the number of user interactions by querying the labels of the most informative points and GSSL allows to use abundant unlabeled data along with the limited labeled data provided by the user.

Active Learning Image Retrieval

To understand deep learning we need to understand kernel learning

no code implementations ICML 2018 Mikhail Belkin, Siyuan Ma, Soumik Mandal

Certain key phenomena of deep learning are manifested similarly in kernel methods in the modern "overfitted" regime.

Generalization Bounds

Approximation beats concentration? An approximation view on inference with smooth radial kernels

no code implementations10 Jan 2018 Mikhail Belkin

We analyze eigenvalue decay of kernels operators and matrices, properties of eigenfunctions/eigenvectors and "Fourier" coefficients of functions in the kernel space restricted to a discrete set of data points.

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

no code implementations ICML 2018 Siyuan Ma, Raef Bassily, Mikhail Belkin

We show that there is a critical batch size $m^*$ such that: (a) SGD iteration with mini-batch size $m\leq m^*$ is nearly equivalent to $m$ iterations of mini-batch size $1$ (\emph{linear scaling regime}).

Unperturbed: spectral analysis beyond Davis-Kahan

no code implementations20 Jun 2017 Justin Eldridge, Mikhail Belkin, Yusu Wang

Classical matrix perturbation results, such as Weyl's theorem for eigenvalues and the Davis-Kahan theorem for eigenvectors, are general purpose.

Diving into the shallows: a computational perspective on large-scale shallow learning

1 code implementation NeurIPS 2017 Siyuan Ma, Mikhail Belkin

An analysis based on the spectral properties of the kernel demonstrates that only a vanishingly small portion of the function space is reachable after a polynomial number of gradient descent iterations.

Learning Privately from Multiparty Data

no code implementations10 Feb 2016 Jihun Hamm, Paul Cao, Mikhail Belkin

How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any party's private data?

Activity Recognition Network Intrusion Detection

Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering

no code implementations21 Jun 2015 Justin Eldridge, Mikhail Belkin, Yusu Wang

In this paper we identify two limit properties, separation and minimality, which address both over-segmentation and improper nesting and together imply (but are not implied by) Hartigan consistency.

Probabilistic Zero-shot Classification with Semantic Rankings

no code implementations27 Feb 2015 Jihun Hamm, Mikhail Belkin

In this paper we propose a non-metric ranking-based representation of semantic similarity that allows natural aggregation of semantic information from multiple heterogeneous sources.

Classification General Classification +3

A Pseudo-Euclidean Iteration for Optimal Recovery in Noisy ICA

no code implementations NeurIPS 2015 James Voss, Mikhail Belkin, Luis Rademacher

We propose a new algorithm, PEGI (for pseudo-Euclidean Gradient Iteration), for provable model recovery for ICA with Gaussian noise.

Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices

no code implementations11 Jan 2015 Jihun Hamm, Adam Champion, Guoxing Chen, Mikhail Belkin, Dong Xuan

Smart devices with built-in sensors, computational capabilities, and network connectivity have become increasingly pervasive.

Learning with Fredholm Kernels

no code implementations NeurIPS 2014 Qichao Que, Mikhail Belkin, Yusu Wang

In this paper we propose a framework for supervised and semi-supervised learning based on reformulating the learning problem as a regularized Fredholm integral equation.

Eigenvectors of Orthogonally Decomposable Functions

no code implementations5 Nov 2014 Mikhail Belkin, Luis Rademacher, James Voss

It includes influential Machine Learning methods such as cumulant-based FastICA and the tensor power iteration for orthogonally decomposable tensors as special cases.

Topic Models

The Hidden Convexity of Spectral Clustering

1 code implementation4 Mar 2014 James Voss, Mikhail Belkin, Luis Rademacher

Geometrically, the proposed algorithms can be interpreted as hidden basis recovery by means of function optimization.

Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis

no code implementations NeurIPS 2013 James R. Voss, Luis Rademacher, Mikhail Belkin

In our paper we develop the first practical algorithm for Independent Component Analysis that is provably invariant under Gaussian noise.

The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures

no code implementations12 Nov 2013 Joseph Anderson, Mikhail Belkin, Navin Goyal, Luis Rademacher, James Voss

The problem of learning this map can be efficiently solved using some recent results on tensor decompositions and Independent Component Analysis (ICA), thus giving an algorithm for recovering the mixture.

Inverse Density as an Inverse Problem: The Fredholm Equation Approach

no code implementations NeurIPS 2013 Qichao Que, Mikhail Belkin

In this paper we address the problem of estimating the ratio $\frac{q}{p}$ where $p$ is a density function and $q$ is another density, or, more generally an arbitrary function.

Transfer Learning

Blind Signal Separation in the Presence of Gaussian Noise

no code implementations7 Nov 2012 Mikhail Belkin, Luis Rademacher, James Voss

In this paper we propose a new algorithm for solving the blind signal separation problem in the presence of additive Gaussian noise, when we are given samples from X=AS+\eta, where \eta is drawn from an unknown, not necessarily spherical n-dimensional Gaussian distribution.

Data Skeletonization via Reeb Graphs

no code implementations NeurIPS 2011 Xiaoyin Ge, Issam I. Safa, Mikhail Belkin, Yusu Wang

While such data is often high-dimensional, it is of interest to approximate it with a low-dimensional or even one-dimensional space, since many important aspects of data are often intrinsically low-dimensional.

Semi-supervised Learning using Sparse Eigenfunction Bases

no code implementations NeurIPS 2009 Kaushik Sinha, Mikhail Belkin

We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices.

Cannot find the paper you are looking for? You can Submit a new open access paper.