Search Results for author: Nathan Srebro

Found 116 papers, 19 papers with code

Depth Separation in Norm-Bounded Infinite-Width Neural Networks

no code implementations13 Feb 2024 Suzanna Parkinson, Greg Ongie, Rebecca Willett, Ohad Shamir, Nathan Srebro

We also show that a similar statement in the reverse direction is not possible: any function learnable with polynomial sample complexity by a norm-controlled depth-2 ReLU network with infinite width is also learnable with polynomial sample complexity by a norm-controlled depth-3 ReLU network.

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

no code implementations9 Feb 2024 Gon Buzaglo, Itamar Harel, Mor Shpigel Nacson, Alon Brutzkus, Nathan Srebro, Daniel Soudry

We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ``teacher NN" that agrees with the labels.

Metalearning with Very Few Samples Per Task

no code implementations21 Dec 2023 Maryam Aliakbarpour, Konstantina Bairaktari, Gavin Brown, Adam Smith, Nathan Srebro, Jonathan Ullman

In multitask learning, we are given a fixed set of related learning tasks and need to output one accurate model per task, whereas in metalearning we are given tasks that are drawn i. i. d.

Binary Classification

Applying statistical learning theory to deep learning

no code implementations26 Nov 2023 Cédric Gerbelot, Avetik Karagulyan, Stefani Karp, Kavya Ravichandran, Menachem Stern, Nathan Srebro

Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods.

Inductive Bias Learning Theory +1

Noisy Interpolation Learning with Shallow Univariate ReLU Networks

no code implementations28 Jul 2023 Nirmit Joshi, Gal Vardi, Nathan Srebro

We show overfitting is tempered (with high probability) when measured with respect to the $L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et.

regression

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression

no code implementations22 Jun 2023 Lijia Zhou, James B. Simon, Gal Vardi, Nathan Srebro

We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model.

regression

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

no code implementations2 Mar 2023 Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization.

Interpolation Learning With Minimum Description Length

no code implementations14 Feb 2023 Naren Sarayu Manoj, Nathan Srebro

We prove that the Minimum Description Length learning rule exhibits tempered overfitting.

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models

1 code implementation21 Oct 2022 Lijia Zhou, Frederic Koehler, Pragya Sur, Danica J. Sutherland, Nathan Srebro

We prove a new generalization bound that shows for any class of linear predictors in Gaussian space, the Rademacher complexity of the class and the training error under any continuous loss $\ell$ can control the test error under all Moreau envelopes of the loss $\ell$.

LEMMA

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

no code implementations13 Oct 2022 Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.

Vocal Bursts Intensity Prediction

Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization

no code implementations15 Sep 2022 Omar Montasser, Steve Hanneke, Nathan Srebro

We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time.

Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets

no code implementations21 May 2022 Gene Li, Cong Ma, Nathan Srebro

We present a family $\{\hat{\pi}\}_{p\ge 1}$ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where $\hat{\pi}_2$ corresponds to Bellman-consistent pessimism (BCP), while $\hat{\pi}_\infty$ is a novel generalization of lower confidence bound (LCB) to the linear setting.

Multi-Armed Bandits

Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization

no code implementations27 Feb 2022 Idan Amir, Roi Livni, Nathan Srebro

We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i. e.~where each instantaneous loss is a scalar convex function of a linear function.

The Sample Complexity of One-Hidden-Layer Neural Networks

no code implementations13 Feb 2022 Gal Vardi, Ohad Shamir, Nathan Srebro

We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.

Exponential Family Model-Based Reinforcement Learning via Score Matching

1 code implementation28 Dec 2021 Gene Li, Junbo Li, Anmol Kabra, Nathan Srebro, Zhaoran Wang, Zhuoran Yang

We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known.

Density Estimation Model-based Reinforcement Learning +3

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

no code implementations8 Dec 2021 Lijia Zhou, Frederic Koehler, Danica J. Sutherland, Nathan Srebro

We study a localized notion of uniform convergence known as an "optimistic rate" (Panchenko 2002; Srebro et al. 2010) for linear regression with Gaussian data.

regression

Representation Costs of Linear Neural Networks: Analysis and Design

no code implementations NeurIPS 2021 Zhen Dai, Mina Karzand, Nathan Srebro

For different parameterizations (mappings from parameters to predictors), we study the regularization cost in predictor space induced by $l_2$ regularization on the parameters (weights).

Transductive Robust Learning Guarantees

no code implementations20 Oct 2021 Omar Montasser, Steve Hanneke, Nathan Srebro

We study the problem of adversarially robust learning in the transductive setting.

A Stochastic Newton Algorithm for Distributed Convex Optimization

no code implementations NeurIPS 2021 Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth

We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations performed between rounds of communication.

regression

On Margin Maximization in Linear and ReLU Networks

no code implementations6 Oct 2021 Gal Vardi, Ohad Shamir, Nathan Srebro

The implicit bias of neural networks has been extensively studied in recent years.

On the Power of Differentiable Learning versus PAC and SQ Learning

no code implementations NeurIPS 2021 Emmanuel Abbe, Pritish Kamath, Eran Malach, Colin Sandon, Nathan Srebro

With fine enough precision relative to minibatch size, namely when $b \rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$.

PAC learning

Fast Margin Maximization via Dual Acceleration

no code implementations1 Jul 2021 Ziwei Ji, Nathan Srebro, Matus Telgarsky

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e. g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$.

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

no code implementations NeurIPS 2021 Frederic Koehler, Lijia Zhou, Danica J. Sutherland, Nathan Srebro

We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width.

Generalization Bounds regression

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning

no code implementations NeurIPS 2021 Blake Woodworth, Nathan Srebro

We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates.

Stochastic Optimization

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting

no code implementations NeurIPS 2021 Frederic Koehler, Lijia Zhou, Danica J. Sutherland, Nathan Srebro

We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class’s Gaussian width.

Generalization Bounds regression

Understanding the Eluder Dimension

no code implementations14 Apr 2021 Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro

We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation.

Active Learning

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

no code implementations1 Mar 2021 Eran Malach, Pritish Kamath, Emmanuel Abbe, Nathan Srebro

Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

no code implementations19 Feb 2021 Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.

Inductive Bias

Adversarially Robust Learning with Unknown Perturbation Sets

no code implementations3 Feb 2021 Omar Montasser, Steve Hanneke, Nathan Srebro

We study the problem of learning predictors that are robust to adversarial examples with respect to an unknown perturbation set, relying instead on interaction with an adversarial attacker or access to attack oracles, examining different models for such interactions.

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

no code implementations2 Feb 2021 Blake Woodworth, Brian Bullins, Ohad Shamir, Nathan Srebro

We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objective, and during each round of communication, each machine may sequentially compute $K$ stochastic gradient estimates.

Does Invariant Risk Minimization Capture Invariance?

no code implementations4 Jan 2021 Pritish Kamath, Akilesh Tangella, Danica J. Sutherland, Nathan Srebro

We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM.

Reducing Adversarially Robust Learning to Non-Robust PAC Learning

no code implementations NeurIPS 2020 Omar Montasser, Steve Hanneke, Nathan Srebro

We study the problem of reducing adversarially robust learning to standard PAC learning, i. e. the complexity of learning adversarially robust predictors using access to only a black-box non-robust learner.

PAC learning

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

no code implementations NeurIPS 2020 Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks".

General Classification

Predictive Value Generalization Bounds

no code implementations9 Jul 2020 Keshav Vemuri, Nathan Srebro

In this paper, we study a bi-criterion framework for assessing scoring functions in the context of binary classification.

Binary Classification General Classification +1

On Uniform Convergence and Low-Norm Interpolation Learning

no code implementations NeurIPS 2020 Lijia Zhou, Danica J. Sutherland, Nathan Srebro

But we argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform convergence of zero-error predictors in a norm ball.

Efficiently Learning Adversarially Robust Halfspaces with Noise

no code implementations ICML 2020 Omar Montasser, Surbhi Goel, Ilias Diakonikolas, Nathan Srebro

We study the problem of learning adversarially robust halfspaces in the distribution-independent setting.

Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent

no code implementations2 Apr 2020 Suriya Gunasekar, Blake Woodworth, Nathan Srebro

We present a primal only derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential.

Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity

no code implementations9 Mar 2020 Pritish Kamath, Omar Montasser, Nathan Srebro

We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to approximate, rather then exactly represent, a given hypothesis class.

Dropout: Explicit Forms and Capacity Control

no code implementations ICLR 2020 Raman Arora, Peter Bartlett, Poorya Mianjy, Nathan Srebro

In deep learning, we show that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks.

BIG-bench Machine Learning Matrix Completion

Fair Learning with Private Demographic Data

1 code implementation ICML 2020 Hussein Mozannar, Mesrob I. Ohannessian, Nathan Srebro

Sensitive attributes such as race are rarely available to learners in real world settings as their collection is often restricted by laws and regulations.

Kernel and Rich Regimes in Overparametrized Models

1 code implementation20 Feb 2020 Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

Is Local SGD Better than Minibatch SGD?

no code implementations ICML 2020 Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.

Distributed Optimization

Lower Bounds for Non-Convex Stochastic Optimization

no code implementations5 Dec 2019 Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.

Stochastic Optimization

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

no code implementations ICLR 2020 Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro

In this paper, we characterize the norm required to realize a function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm.

Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory

no code implementations1 Jul 2019 Blake Woodworth, Nathan Srebro

We note that known methods achieving the optimal oracle complexity for first order convex optimization require quadratic memory, and ask whether this is necessary, and more broadly seek to characterize the minimax number of first order queries required to optimize a convex Lipschitz function subject to a memory constraint.

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

1 code implementation21 Jun 2019 Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth

We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates.

valid

Kernel and Rich Regimes in Overparametrized Models

1 code implementation13 Jun 2019 Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro

A recent line of work studies overparametrized neural networks in the "kernel regime," i. e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution.

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

no code implementations17 May 2019 Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry

With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.

The role of over-parametrization in generalization of neural networks

1 code implementation ICLR 2019 Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro

Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.

Semi-Cyclic Stochastic Gradient Descent

no code implementations23 Apr 2019 Hubert Eichner, Tomer Koren, H. Brendan McMahan, Nathan Srebro, Kunal Talwar

We consider convex SGD updates with a block-cyclic structure, i. e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution.

Federated Learning

The Complexity of Making the Gradient Small in Stochastic Convex Optimization

no code implementations13 Feb 2019 Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.

Stochastic Optimization

How do infinite width bounded norm networks look in function space?

no code implementations13 Feb 2019 Pedro Savarese, Itay Evron, Daniel Soudry, Nathan Srebro

We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function.

From Fair Decision Making to Social Equality

no code implementations7 Dec 2018 Hussein Mozannar, Mesrob I. Ohannessian, Nathan Srebro

In this paper, we propose a simple yet revealing model that encompasses (1) a selection process where an institution chooses from multiple groups according to their qualifications so as to maximize an institutional utility and (2) dynamics that govern the evolution of the groups' qualifications according to the imposed policies.

Decision Making Fairness

On preserving non-discrimination when combining expert advice

no code implementations NeurIPS 2018 Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, Nathan Srebro

We study the interplay between sequential decision making and avoiding discrimination against protected groups, when examples arrive online and do not follow distributional assumptions.

Decision Making

Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

1 code implementation29 Jun 2018 Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You

Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals.

Fairness

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

no code implementations26 Jun 2018 Yossi Arjevani, Ohad Shamir, Nathan Srebro

We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago.

Distributed Optimization

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

no code implementations5 Jun 2018 Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry

We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data.

Implicit Bias of Gradient Descent on Linear Convolutional Networks

no code implementations NeurIPS 2018 Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain.

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

2 code implementations30 May 2018 Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro

Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

no code implementations NeurIPS 2018 Blake Woodworth, Jialei Wang, Adam Smith, Brendan Mcmahan, Nathan Srebro

We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph.

Stochastic Optimization

The Everlasting Database: Statistical Validity at a Fair Price

no code implementations NeurIPS 2018 Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nathan Srebro

The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries.

Convergence of Gradient Descent on Separable Data

no code implementations5 Mar 2018 Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry

We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.

Characterizing Implicit Bias in Terms of Optimization Geometry

no code implementations ICML 2018 Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

We study the implicit bias of generic optimization methods, such as mirror descent, natural gradient descent, and steepest descent with respect to different potentials and norms, when optimizing underdetermined linear regression or separable linear classification problems.

General Classification regression

Distributed Stochastic Multi-Task Learning with Graph Regularization

no code implementations11 Feb 2018 Weiran Wang, Jialei Wang, Mladen Kolar, Nathan Srebro

We propose methods for distributed graph-based multi-task learning that are based on weighted averaging of messages from other machines.

Multi-Task Learning

An Accelerated Communication-Efficient Primal-Dual Optimization Framework for Structured Machine Learning

1 code implementation14 Nov 2017 Chenxin Ma, Martin Jaggi, Frank E. Curtis, Nathan Srebro, Martin Takáč

In this paper, an accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of $\mathcal{O}(1/t^2)$ in terms of reducing suboptimality.

BIG-bench Machine Learning Distributed Optimization

The Implicit Bias of Gradient Descent on Separable Data

2 code implementations ICLR 2018 Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.

Stochastic Nonconvex Optimization with Large Minibatches

no code implementations25 Sep 2017 Weiran Wang, Nathan Srebro

We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks.

Stochastic Optimization

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

no code implementations ICLR 2018 Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro

We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights.

Exploring Generalization in Deep Learning

2 code implementations NeurIPS 2017 Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness.

Implicit Regularization in Matrix Factorization

no code implementations NeurIPS 2017 Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$.

The Marginal Value of Adaptive Gradient Methods in Machine Learning

3 code implementations NeurIPS 2017 Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks.

BIG-bench Machine Learning Binary Classification

Geometry of Optimization and Implicit Regularization in Deep Learning

1 code implementation8 May 2017 Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization.

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

no code implementations ICML 2017 Dan Garber, Ohad Shamir, Nathan Srebro

We study algorithms for estimating the leading principal component of the population covariance matrix that are both communication-efficient and achieve estimation error of the order of the centralized ERM solution that uses all $mn$ samples.

Efficient coordinate-wise leading eigenvector computation

no code implementations25 Feb 2017 Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro

We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product.

regression

Stochastic Approximation for Canonical Correlation Analysis

no code implementations NeurIPS 2017 Raman Arora, Teodor V. Marinov, Poorya Mianjy, Nathan Srebro

We propose novel first-order stochastic approximation algorithms for canonical correlation analysis (CCA).

Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch-Prox

no code implementations21 Feb 2017 Jialei Wang, Weiran Wang, Nathan Srebro

We present and analyze an approach for distributed stochastic optimization which is statistically optimal and achieves near-linear speedups (up to logarithmic factors).

Stochastic Optimization

Stochastic Canonical Correlation Analysis

no code implementations21 Feb 2017 Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang

We study the sample complexity of canonical correlation analysis (CCA), \ie, the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error.

Stochastic Optimization

Learning Non-Discriminatory Predictors

no code implementations20 Feb 2017 Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, Nathan Srebro

We consider learning a predictor which is non-discriminatory with respect to a "protected attribute" according to the notion of "equalized odds" proposed by Hardt et al. [2016].

Attribute

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

no code implementations10 Oct 2016 Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro

Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data.

Equality of Opportunity in Supervised Learning

7 code implementations NeurIPS 2016 Moritz Hardt, Eric Price, Nathan Srebro

We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features.

Attribute General Classification

Efficient Distributed Learning with Sparsity

no code implementations ICML 2017 Jialei Wang, Mladen Kolar, Nathan Srebro, Tong Zhang

We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines.

General Classification regression +1

Tight Complexity Bounds for Optimizing Composite Objectives

no code implementations NeurIPS 2016 Blake Woodworth, Nathan Srebro

We provide tight upper and lower bounds on the complexity of minimizing the average of $m$ convex functions using gradient and prox oracles of the component functions.

Global Optimality of Local Search for Low Rank Matrix Recovery

no code implementations NeurIPS 2016 Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements.

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

no code implementations NeurIPS 2016 Behnam Neyshabur, Yuhuai Wu, Ruslan Salakhutdinov, Nathan Srebro

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations.

Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis

no code implementations NeurIPS 2016 Weiran Wang, Jialei Wang, Dan Garber, Nathan Srebro

We study the stochastic optimization of canonical correlation analysis (CCA), whose objective is nonconvex and does not decouple over training samples.

Stochastic Optimization

Distributed Multi-Task Learning with Shared Representation

no code implementations7 Mar 2016 Jialei Wang, Mladen Kolar, Nathan Srebro

We study the problem of distributed multi-task learning with shared representation, where each machine aims to learn a separate, but related, task in an unknown shared low-dimensional subspaces, i. e. when the predictor matrix has low rank.

Multi-Task Learning

Reducing Runtime by Recycling Samples

no code implementations5 Feb 2016 Jialei Wang, Hai Wang, Nathan Srebro

Contrary to the situation with stochastic gradient descent, we argue that when using stochastic methods with variance reduction, such as SDCA, SAG or SVRG, as well as their variants, it could be beneficial to reuse previously used samples instead of fresh samples, even when fresh samples are available.

Data-Dependent Path Normalization in Neural Networks

no code implementations20 Nov 2015 Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

We propose a unified framework for neural net normalization, regularization and optimization, which includes Path-SGD and Batch-Normalization and interpolates between them across two different dimensions.

Fast and Scalable Structural SVM with Slack Rescaling

no code implementations20 Oct 2015 Heejin Choi, Ofer Meshi, Nathan Srebro

We present an efficient method for training slack-rescaled structural SVM.

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

no code implementations7 Oct 2015 Weiran Wang, Raman Arora, Karen Livescu, Nathan Srebro

Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains.

Representation Learning Stochastic Optimization

Distributed Multitask Learning

no code implementations2 Oct 2015 Jialei Wang, Mladen Kolar, Nathan Srebro

We present a communication-efficient estimator based on the debiased lasso and show that it is comparable with the optimal centralized method.

Multi-Task Learning

Normalized Hierarchical SVM

no code implementations11 Aug 2015 Heejin Choi, Yutaka Sasaki, Nathan Srebro

We present improved methods of using structured SVMs in a large-scale hierarchical classification problem, that is when labels are leaves, or sets of leaves, in a tree or a DAG.

General Classification

Distributed Mini-Batch SDCA

no code implementations29 Jul 2015 Martin Takáč, Peter Richtárik, Nathan Srebro

We present an improved analysis of mini-batched stochastic dual coordinate ascent for regularized empirical loss minimization (i. e. SVM and SVM-type objectives).

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

1 code implementation NeurIPS 2015 Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights.

Norm-Based Capacity Control in Neural Networks

no code implementations27 Feb 2015 Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

no code implementations20 Dec 2014 Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks.

Inductive Bias

On Symmetric and Asymmetric LSHs for Inner Product Search

1 code implementation21 Oct 2014 Behnam Neyshabur, Nathan Srebro

We consider the problem of designing locality sensitive hashes (LSH) for inner product similarity, and of the power of asymmetric hashes in this context.

Clustering, Hamming Embedding, Generalized LSH and the Max Norm

no code implementations13 May 2014 Behnam Neyshabur, Yury Makarychev, Nathan Srebro

We study the convex relaxation of clustering and hamming embedding, focusing on the asymmetric case (co-clustering and asymmetric hamming embedding), understanding their relationship to LSH as studied by (Charikar 2002) and to the max-norm ball, and the differences between their symmetric and asymmetric versions.

Clustering

Communication Efficient Distributed Optimization using an Approximate Newton-type Method

1 code implementation30 Dec 2013 Ohad Shamir, Nathan Srebro, Tong Zhang

We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems.

Distributed Optimization Vocal Bursts Type Prediction

The Power of Asymmetry in Binary Hashing

1 code implementation NeurIPS 2013 Behnam Neyshabur, Payman Yadollahpour, Yury Makarychev, Ruslan Salakhutdinov, Nathan Srebro

When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps.

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

no code implementations NeurIPS 2014 Deanna Needell, Nathan Srebro, Rachel Ward

Furthermore, we show how reweighting the sampling distribution (i. e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence in the average smoothness, dominating previous results.

Stochastic Optimization of PCA with Capped MSG

no code implementations NeurIPS 2013 Raman Arora, Andrew Cotter, Nathan Srebro

We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as "Matrix Stochastic Gradient" (MSG), as well as a practical variant, Capped MSG.

Stochastic Optimization

Auditing: Active Learning with Outcome-Dependent Query Costs

no code implementations NeurIPS 2013 Sivan Sabato, Anand D. Sarwate, Nathan Srebro

We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error.

Active Learning Binary Classification +2

Learning Sparse Low-Threshold Linear Classifiers

no code implementations13 Dec 2012 Sivan Sabato, Shai Shalev-Shwartz, Nathan Srebro, Daniel Hsu, Tong Zhang

We consider the problem of learning a non-negative linear classifier with a $1$-norm of at most $k$, and a fixed threshold, under the hinge-loss.

Sparse Prediction with the k-Support Norm

no code implementations NeurIPS 2012 Andreas Argyriou, Rina Foygel, Nathan Srebro

We derive a novel norm that corresponds to the tightest convex relaxation of sparsity combined with an $\ell_2$ penalty.

Matrix reconstruction with the local max norm

no code implementations NeurIPS 2012 Rina Foygel, Nathan Srebro, Ruslan R. Salakhutdinov

We introduce a new family of matrix norms, the ''local max'' norms, generalizing existing methods such as the max norm, the trace norm (nuclear norm), and the weighted or smoothed weighted trace norms, which have been extensively used in the literature as regularizers for matrix reconstruction problems.

Distribution-Dependent Sample Complexity of Large Margin Learning

no code implementations5 Apr 2012 Sivan Sabato, Nathan Srebro, Naftali Tishby

We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the margin-adapted dimension, which is a simple function of the second order statistics of the data distribution, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the margin-adapted dimension of the data distribution.

Active Learning General Classification +1

Smoothness, Low Noise and Fast Rates

no code implementations NeurIPS 2010 Nathan Srebro, Karthik Sridharan, Ambuj Tewari

We establish an excess risk bound of O(H R_n^2 + sqrt{H L*} R_n) for ERM with an H-smooth loss function and a hypothesis class with Rademacher complexity R_n, where L* is the best risk achievable by the hypothesis class.

Tight Sample Complexity of Large-Margin Learning

no code implementations NeurIPS 2010 Sivan Sabato, Nathan Srebro, Naftali Tishby

We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the gamma-adapted-dimension, which is a simple function of the spectrum of a distribution's covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the gamma-adapted-dimension of the source distribution.

Classification General Classification +1

Practical Large-Scale Optimization for Max-norm Regularization

no code implementations NeurIPS 2010 Jason D. Lee, Ben Recht, Nathan Srebro, Joel Tropp, Ruslan R. Salakhutdinov

The max-norm was proposed as a convex matrix regularizer by Srebro et al (2004) and was shown to be empirically superior to the trace-norm for collaborative filtering problems.

Clustering Collaborative Filtering

Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm

no code implementations NeurIPS 2010 Nathan Srebro, Ruslan R. Salakhutdinov

We show that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace-norm regularizer works well with non-uniform sampling.

Collaborative Filtering Matrix Completion

Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

no code implementations NeurIPS 2009 Boaz Nadler, Nathan Srebro, Xueyuan Zhou

We study the behavior of the popular Laplacian Regularization method for Semi-Supervised Learning at the regime of a fixed number of labeled points but a large number of unlabeled points.

Fast Rates for Regularized Objectives

no code implementations NeurIPS 2008 Karthik Sridharan, Shai Shalev-Shwartz, Nathan Srebro

We show that the empirical minimizer of a stochastic strongly convex objective, where the stochastic component is linear, converges to the population minimizer with rate $O(1/n)$.

Cannot find the paper you are looking for? You can Submit a new open access paper.