Search Results for author: Aaron Mishkin

Found 10 papers, 4 papers with code

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

no code implementations3 Apr 2024 Aaron Mishkin, Mert Pilanci, Mark Schmidt

This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

no code implementations6 Mar 2024 Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants.

Level Set Teleportation: An Optimization Perspective

no code implementations5 Mar 2024 Aaron Mishkin, Alberto Bietti, Robert M. Gower

We study level set teleportation, an optimization sub-routine which seeks to accelerate gradient methods by maximizing the gradient norm on a level-set of the objective function.

LEMMA

A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features

no code implementations2 Mar 2024 Emi Zeger, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, Mert Pilanci

We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features.

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

no code implementations3 Jul 2023 Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She

We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.

Optimal Sets and Solution Paths of ReLU Networks

1 code implementation31 May 2023 Aaron Mishkin, Mert Pilanci

We show that the global optima of the convex parameterization are given by a polyhedral set and then extend this characterization to the optimal set of the non-convex training objective.

Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

1 code implementation2 Feb 2022 Aaron Mishkin, Arda Sahiner, Mert Pilanci

We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions.

Image Classification

To Each Optimizer a Norm, To Each Norm its Generalization

no code implementations11 Jun 2020 Sharan Vaswani, Reza Babanezhad, Jose Gallego, Aaron Mishkin, Simon Lacoste-Julien, Nicolas Le Roux

For under-parameterized linear classification, we prove that for any linear classifier separating the data, there exists a family of quadratic norms ||.||_P such that the classifier's direction is the same as that of the maximum P-margin solution.

Classification General Classification

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

2 code implementations NeurIPS 2018 Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan

Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution.

Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.