no code implementations • 24 Oct 2024 • Itamar Harel, William M. Hoza, Gal Vardi, Itay Evron, Nathan Srebro, Daniel Soudry
We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set.
no code implementations • 8 Oct 2024 • Anmol Kabra, Mina Karzand, Tosca Lechner, Nathan Srebro, Serena Wang
We present a framework for designing scores to summarize performance metrics.
no code implementations • 5 Sep 2024 • Marko Medvedev, Gal Vardi, Nathan Srebro
We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i. e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size.
no code implementations • 8 Jul 2024 • Nirmit Joshi, Theodor Misiakiewicz, Nathan Srebro
The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas).
no code implementations • 7 Jun 2024 • Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe
We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization.
no code implementations • 19 May 2024 • Kumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang, Sebastian U. Stich, Ziheng Cheng, Nirmit Joshi, Nathan Srebro
In this paper, we provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing that these assumptions are insufficient to prove the effectiveness of local update steps.
no code implementations • 13 Feb 2024 • Suzanna Parkinson, Greg Ongie, Rebecca Willett, Ohad Shamir, Nathan Srebro
We also show that a similar statement in the reverse direction is not possible: any function learnable with polynomial sample complexity by a norm-controlled depth-2 ReLU network with infinite width is also learnable with polynomial sample complexity by a norm-controlled depth-3 ReLU network.
no code implementations • 9 Feb 2024 • Gon Buzaglo, Itamar Harel, Mor Shpigel Nacson, Alon Brutzkus, Nathan Srebro, Daniel Soudry
We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ``teacher NN'' that agrees with the labels.
no code implementations • 21 Dec 2023 • Maryam Aliakbarpour, Konstantina Bairaktari, Gavin Brown, Adam Smith, Nathan Srebro, Jonathan Ullman
In multitask learning, we are given a fixed set of related learning tasks and need to output one accurate model per task, whereas in metalearning we are given tasks that are drawn i. i. d.
no code implementations • 26 Nov 2023 • Cédric Gerbelot, Avetik Karagulyan, Stefani Karp, Kavya Ravichandran, Menachem Stern, Nathan Srebro
Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods.
no code implementations • 28 Jul 2023 • Nirmit Joshi, Gal Vardi, Nathan Srebro
We show overfitting is tempered (with high probability) when measured with respect to the $L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et.
no code implementations • 22 Jun 2023 • Lijia Zhou, James B. Simon, Gal Vardi, Nathan Srebro
We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model.
no code implementations • 6 Jun 2023 • Itay Evron, Edward Moroshko, Gon Buzaglo, Maroun Khriesh, Badea Marjieh, Nathan Srebro, Daniel Soudry
We analyze continual learning on a sequence of separable linear classification tasks with binary labels.
no code implementations • 2 Mar 2023 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization.
no code implementations • 14 Feb 2023 • Naren Sarayu Manoj, Nathan Srebro
We prove that the Minimum Description Length learning rule exhibits tempered overfitting.
1 code implementation • 21 Oct 2022 • Lijia Zhou, Frederic Koehler, Pragya Sur, Danica J. Sutherland, Nathan Srebro
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space, the Rademacher complexity of the class and the training error under any continuous loss $\ell$ can control the test error under all Moreau envelopes of the loss $\ell$.
no code implementations • 13 Oct 2022 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.
no code implementations • 15 Sep 2022 • Omar Montasser, Steve Hanneke, Nathan Srebro
We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time.
no code implementations • 21 May 2022 • Gene Li, Cong Ma, Nathan Srebro
We present a family $\{\hat{\pi}\}_{p\ge 1}$ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where $\hat{\pi}_2$ corresponds to Bellman-consistent pessimism (BCP), while $\hat{\pi}_\infty$ is a novel generalization of lower confidence bound (LCB) to the linear setting.
no code implementations • 27 Feb 2022 • Idan Amir, Roi Livni, Nathan Srebro
We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i. e.~where each instantaneous loss is a scalar convex function of a linear function.
no code implementations • 13 Feb 2022 • Gal Vardi, Ohad Shamir, Nathan Srebro
We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
1 code implementation • 28 Dec 2021 • Gene Li, Junbo Li, Anmol Kabra, Nathan Srebro, Zhaoran Wang, Zhuoran Yang
We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known.
no code implementations • 8 Dec 2021 • Lijia Zhou, Frederic Koehler, Danica J. Sutherland, Nathan Srebro
We study a localized notion of uniform convergence known as an "optimistic rate" (Panchenko 2002; Srebro et al. 2010) for linear regression with Gaussian data.
no code implementations • NeurIPS 2021 • Zhen Dai, Mina Karzand, Nathan Srebro
For different parameterizations (mappings from parameters to predictors), we study the regularization cost in predictor space induced by $l_2$ regularization on the parameters (weights).
no code implementations • 20 Oct 2021 • Omar Montasser, Steve Hanneke, Nathan Srebro
We study the problem of adversarially robust learning in the transductive setting.
no code implementations • NeurIPS 2021 • Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth
We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations performed between rounds of communication.
no code implementations • 6 Oct 2021 • Gal Vardi, Ohad Shamir, Nathan Srebro
The implicit bias of neural networks has been extensively studied in recent years.
no code implementations • NeurIPS 2021 • Emmanuel Abbe, Pritish Kamath, Eran Malach, Colin Sandon, Nathan Srebro
With fine enough precision relative to minibatch size, namely when $b \rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$.
no code implementations • 1 Jul 2021 • Ziwei Ji, Nathan Srebro, Matus Telgarsky
We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e. g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$.
no code implementations • NeurIPS 2021 • Frederic Koehler, Lijia Zhou, Danica J. Sutherland, Nathan Srebro
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width.
no code implementations • NeurIPS 2021 • Blake Woodworth, Nathan Srebro
We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates.
no code implementations • NeurIPS 2021 • Frederic Koehler, Lijia Zhou, Danica J. Sutherland, Nathan Srebro
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class’s Gaussian width.
no code implementations • 14 Apr 2021 • Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro
We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation.
no code implementations • 1 Mar 2021 • Eran Malach, Pritish Kamath, Emmanuel Abbe, Nathan Srebro
Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.
no code implementations • 19 Feb 2021 • Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry
Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.
no code implementations • 3 Feb 2021 • Omar Montasser, Steve Hanneke, Nathan Srebro
We study the problem of learning predictors that are robust to adversarial examples with respect to an unknown perturbation set, relying instead on interaction with an adversarial attacker or access to attack oracles, examining different models for such interactions.
no code implementations • 2 Feb 2021 • Blake Woodworth, Brian Bullins, Ohad Shamir, Nathan Srebro
We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objective, and during each round of communication, each machine may sequentially compute $K$ stochastic gradient estimates.
no code implementations • 4 Jan 2021 • Pritish Kamath, Akilesh Tangella, Danica J. Sutherland, Nathan Srebro
We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM.
no code implementations • NeurIPS 2020 • Omar Montasser, Steve Hanneke, Nathan Srebro
We study the problem of reducing adversarially robust learning to standard PAC learning, i. e. the complexity of learning adversarially robust predictors using access to only a black-box non-robust learner.
no code implementations • NeurIPS 2020 • Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry
We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks".
no code implementations • 9 Jul 2020 • Keshav Vemuri, Nathan Srebro
In this paper, we study a bi-criterion framework for assessing scoring functions in the context of binary classification.
no code implementations • NeurIPS 2020 • Lijia Zhou, Danica J. Sutherland, Nathan Srebro
But we argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform convergence of zero-error predictors in a norm ball.
no code implementations • NeurIPS 2020 • Blake Woodworth, Kumar Kshitij Patel, Nathan Srebro
the average objective; and machines can only communicate intermittently.
no code implementations • ICML 2020 • Omar Montasser, Surbhi Goel, Ilias Diakonikolas, Nathan Srebro
We study the problem of learning adversarially robust halfspaces in the distribution-independent setting.
no code implementations • 2 Apr 2020 • Suriya Gunasekar, Blake Woodworth, Nathan Srebro
We present a primal only derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential.
no code implementations • 9 Mar 2020 • Pritish Kamath, Omar Montasser, Nathan Srebro
We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to approximate, rather then exactly represent, a given hypothesis class.
no code implementations • ICLR 2020 • Raman Arora, Peter Bartlett, Poorya Mianjy, Nathan Srebro
In deep learning, we show that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks.
1 code implementation • ICML 2020 • Hussein Mozannar, Mesrob I. Ohannessian, Nathan Srebro
Sensitive attributes such as race are rarely available to learners in real world settings as their collection is often restricted by laws and regulations.
1 code implementation • 20 Feb 2020 • Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro
We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
no code implementations • ICML 2020 • Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro
We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.
no code implementations • 5 Dec 2019 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth
We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.
no code implementations • ICLR 2020 • Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro
In this paper, we characterize the norm required to realize a function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm.
no code implementations • 1 Jul 2019 • Blake Woodworth, Nathan Srebro
We note that known methods achieving the optimal oracle complexity for first order convex optimization require quadratic memory, and ask whether this is necessary, and more broadly seek to characterize the minimax number of first order queries required to optimize a convex Lipschitz function subject to a memory constraint.
1 code implementation • 21 Jun 2019 • Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth
We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates.
1 code implementation • 13 Jun 2019 • Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro
A recent line of work studies overparametrized neural networks in the "kernel regime," i. e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution.
no code implementations • 17 May 2019 • Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry
With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.
1 code implementation • ICLR 2019 • Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro
Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.
no code implementations • 23 Apr 2019 • Hubert Eichner, Tomer Koren, H. Brendan McMahan, Nathan Srebro, Kunal Talwar
We consider convex SGD updates with a block-cyclic structure, i. e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution.
no code implementations • 13 Feb 2019 • Pedro Savarese, Itay Evron, Daniel Soudry, Nathan Srebro
We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function.
no code implementations • 13 Feb 2019 • Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth
Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.
no code implementations • 12 Feb 2019 • Omar Montasser, Steve Hanneke, Nathan Srebro
We study the question of learning an adversarially robust predictor.
no code implementations • 7 Dec 2018 • Hussein Mozannar, Mesrob I. Ohannessian, Nathan Srebro
In this paper, we propose a simple yet revealing model that encompasses (1) a selection process where an institution chooses from multiple groups according to their qualifications so as to maximize an institutional utility and (2) dynamics that govern the evolution of the groups' qualifications according to the imposed policies.
no code implementations • NeurIPS 2018 • Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, Nathan Srebro
We study the interplay between sequential decision making and avoiding discrimination against protected groups, when examples arrive online and do not follow distributional assumptions.
1 code implementation • 29 Jun 2018 • Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You
Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals.
no code implementations • 26 Jun 2018 • Yossi Arjevani, Ohad Shamir, Nathan Srebro
We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago.
no code implementations • 5 Jun 2018 • Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry
We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data.
no code implementations • NeurIPS 2018 • Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro
We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain.
2 code implementations • 30 May 2018 • Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro
Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.
no code implementations • NeurIPS 2018 • Blake Woodworth, Jialei Wang, Adam Smith, Brendan Mcmahan, Nathan Srebro
We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph.
no code implementations • NeurIPS 2018 • Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nathan Srebro
The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries.
no code implementations • 5 Mar 2018 • Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry
We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.
no code implementations • ICML 2018 • Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro
We study the implicit bias of generic optimization methods, such as mirror descent, natural gradient descent, and steepest descent with respect to different potentials and norms, when optimizing underdetermined linear regression or separable linear classification problems.
no code implementations • 11 Feb 2018 • Weiran Wang, Jialei Wang, Mladen Kolar, Nathan Srebro
We propose methods for distributed graph-based multi-task learning that are based on weighted averaging of messages from other machines.
1 code implementation • 14 Nov 2017 • Chenxin Ma, Martin Jaggi, Frank E. Curtis, Nathan Srebro, Martin Takáč
In this paper, an accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of $\mathcal{O}(1/t^2)$ in terms of reducing suboptimality.
2 code implementations • ICLR 2018 • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.
no code implementations • 25 Sep 2017 • Weiran Wang, Nathan Srebro
We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks.
no code implementations • ICLR 2018 • Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro
We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights.
2 code implementations • NeurIPS 2017 • Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro
With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness.
no code implementations • NeurIPS 2017 • Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$.
3 code implementations • NeurIPS 2017 • Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht
Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks.
1 code implementation • 8 May 2017 • Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro
We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization.
no code implementations • ICML 2017 • Dan Garber, Ohad Shamir, Nathan Srebro
We study algorithms for estimating the leading principal component of the population covariance matrix that are both communication-efficient and achieve estimation error of the order of the centralized ERM solution that uses all $mn$ samples.
no code implementations • 25 Feb 2017 • Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro
We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product.
no code implementations • NeurIPS 2017 • Raman Arora, Teodor V. Marinov, Poorya Mianjy, Nathan Srebro
We propose novel first-order stochastic approximation algorithms for canonical correlation analysis (CCA).
no code implementations • 21 Feb 2017 • Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang
We study the sample complexity of canonical correlation analysis (CCA), \ie, the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error.
no code implementations • 21 Feb 2017 • Jialei Wang, Weiran Wang, Nathan Srebro
We present and analyze an approach for distributed stochastic optimization which is statistically optimal and achieves near-linear speedups (up to logarithmic factors).
no code implementations • 20 Feb 2017 • Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, Nathan Srebro
We consider learning a predictor which is non-discriminatory with respect to a "protected attribute" according to the notion of "equalized odds" proposed by Hardt et al. [2016].
no code implementations • 10 Oct 2016 • Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro
Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data.
7 code implementations • NeurIPS 2016 • Moritz Hardt, Eric Price, Nathan Srebro
We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features.
no code implementations • NeurIPS 2016 • Blake Woodworth, Nathan Srebro
We provide tight upper and lower bounds on the complexity of minimizing the average of $m$ convex functions using gradient and prox oracles of the component functions.
no code implementations • ICML 2017 • Jialei Wang, Mladen Kolar, Nathan Srebro, Tong Zhang
We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines.
no code implementations • NeurIPS 2016 • Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements.
no code implementations • NeurIPS 2016 • Behnam Neyshabur, Yuhuai Wu, Ruslan Salakhutdinov, Nathan Srebro
We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations.
no code implementations • NeurIPS 2016 • Weiran Wang, Jialei Wang, Dan Garber, Nathan Srebro
We study the stochastic optimization of canonical correlation analysis (CCA), whose objective is nonconvex and does not decouple over training samples.
no code implementations • 7 Mar 2016 • Jialei Wang, Mladen Kolar, Nathan Srebro
We study the problem of distributed multi-task learning with shared representation, where each machine aims to learn a separate, but related, task in an unknown shared low-dimensional subspaces, i. e. when the predictor matrix has low rank.
no code implementations • 5 Feb 2016 • Jialei Wang, Hai Wang, Nathan Srebro
Contrary to the situation with stochastic gradient descent, we argue that when using stochastic methods with variance reduction, such as SDCA, SAG or SVRG, as well as their variants, it could be beneficial to reuse previously used samples instead of fresh samples, even when fresh samples are available.
no code implementations • 20 Nov 2015 • Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro
We propose a unified framework for neural net normalization, regularization and optimization, which includes Path-SGD and Batch-Normalization and interpolates between them across two different dimensions.
no code implementations • 20 Oct 2015 • Heejin Choi, Ofer Meshi, Nathan Srebro
We present an efficient method for training slack-rescaled structural SVM.
no code implementations • 7 Oct 2015 • Weiran Wang, Raman Arora, Karen Livescu, Nathan Srebro
Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains.
no code implementations • 2 Oct 2015 • Jialei Wang, Mladen Kolar, Nathan Srebro
We present a communication-efficient estimator based on the debiased lasso and show that it is comparable with the optimal centralized method.
no code implementations • 11 Aug 2015 • Heejin Choi, Yutaka Sasaki, Nathan Srebro
We present improved methods of using structured SVMs in a large-scale hierarchical classification problem, that is when labels are leaves, or sets of leaves, in a tree or a DAG.
no code implementations • 29 Jul 2015 • Martin Takáč, Peter Richtárik, Nathan Srebro
We present an improved analysis of mini-batched stochastic dual coordinate ascent for regularized empirical loss minimization (i. e. SVM and SVM-type objectives).
1 code implementation • NeurIPS 2015 • Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro
We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights.
no code implementations • 27 Feb 2015 • Behnam Neyshabur, Ryota Tomioka, Nathan Srebro
We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.
no code implementations • 20 Dec 2014 • Behnam Neyshabur, Ryota Tomioka, Nathan Srebro
We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks.
1 code implementation • 21 Oct 2014 • Behnam Neyshabur, Nathan Srebro
We consider the problem of designing locality sensitive hashes (LSH) for inner product similarity, and of the power of asymmetric hashes in this context.
no code implementations • 13 May 2014 • Behnam Neyshabur, Yury Makarychev, Nathan Srebro
We study the convex relaxation of clustering and hamming embedding, focusing on the asymmetric case (co-clustering and asymmetric hamming embedding), understanding their relationship to LSH as studied by (Charikar 2002) and to the max-norm ball, and the differences between their symmetric and asymmetric versions.
1 code implementation • 30 Dec 2013 • Ohad Shamir, Nathan Srebro, Tong Zhang
We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems.
1 code implementation • NeurIPS 2013 • Behnam Neyshabur, Payman Yadollahpour, Yury Makarychev, Ruslan Salakhutdinov, Nathan Srebro
When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps.
no code implementations • NeurIPS 2014 • Deanna Needell, Nathan Srebro, Rachel Ward
Furthermore, we show how reweighting the sampling distribution (i. e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence in the average smoothness, dominating previous results.
no code implementations • NeurIPS 2013 • Raman Arora, Andrew Cotter, Nathan Srebro
We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as "Matrix Stochastic Gradient" (MSG), as well as a practical variant, Capped MSG.
no code implementations • NeurIPS 2013 • Sivan Sabato, Anand D. Sarwate, Nathan Srebro
We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error.
no code implementations • 13 Dec 2012 • Sivan Sabato, Shai Shalev-Shwartz, Nathan Srebro, Daniel Hsu, Tong Zhang
We consider the problem of learning a non-negative linear classifier with a $1$-norm of at most $k$, and a fixed threshold, under the hinge-loss.
no code implementations • NeurIPS 2012 • Andreas Argyriou, Rina Foygel, Nathan Srebro
We derive a novel norm that corresponds to the tightest convex relaxation of sparsity combined with an $\ell_2$ penalty.
no code implementations • NeurIPS 2012 • Rina Foygel, Nathan Srebro, Ruslan R. Salakhutdinov
We introduce a new family of matrix norms, the ''local max'' norms, generalizing existing methods such as the max norm, the trace norm (nuclear norm), and the weighted or smoothed weighted trace norms, which have been extensively used in the literature as regularizers for matrix reconstruction problems.
no code implementations • 5 Apr 2012 • Sivan Sabato, Nathan Srebro, Naftali Tishby
We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the margin-adapted dimension, which is a simple function of the second order statistics of the data distribution, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the margin-adapted dimension of the data distribution.
no code implementations • NeurIPS 2010 • Sivan Sabato, Nathan Srebro, Naftali Tishby
We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the gamma-adapted-dimension, which is a simple function of the spectrum of a distribution's covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the gamma-adapted-dimension of the source distribution.
no code implementations • NeurIPS 2010 • Jason D. Lee, Ben Recht, Nathan Srebro, Joel Tropp, Ruslan R. Salakhutdinov
The max-norm was proposed as a convex matrix regularizer by Srebro et al (2004) and was shown to be empirically superior to the trace-norm for collaborative filtering problems.
no code implementations • NeurIPS 2010 • Nathan Srebro, Ruslan R. Salakhutdinov
We show that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace-norm regularizer works well with non-uniform sampling.
no code implementations • NeurIPS 2010 • Nathan Srebro, Karthik Sridharan, Ambuj Tewari
We establish an excess risk bound of O(H R_n^2 + sqrt{H L*} R_n) for ERM with an H-smooth loss function and a hypothesis class with Rademacher complexity R_n, where L* is the best risk achievable by the hypothesis class.
no code implementations • NeurIPS 2009 • Boaz Nadler, Nathan Srebro, Xueyuan Zhou
We study the behavior of the popular Laplacian Regularization method for Semi-Supervised Learning at the regime of a fixed number of labeled points but a large number of unlabeled points.
no code implementations • NeurIPS 2008 • Karthik Sridharan, Shai Shalev-Shwartz, Nathan Srebro
We show that the empirical minimizer of a stochastic strongly convex objective, where the stochastic component is linear, converges to the population minimizer with rate $O(1/n)$.