Search Results for author: Hadi Daneshmand

Found 18 papers, 5 papers with code

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

1 code implementation3 Oct 2023 Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand

We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth.

On the impact of activation and normalization in obtaining isometric embeddings at initialization

1 code implementation NeurIPS 2023 Amir Joudaki, Hadi Daneshmand, Francis Bach

In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs.

Efficient displacement convex optimization with particle gradient descent

no code implementations9 Feb 2023 Hadi Daneshmand, Jason D. Lee, Chi Jin

Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures.

Polynomial-time Sparse Measure Recovery: From Mean Field Theory to Algorithm Design

1 code implementation16 Apr 2022 Hadi Daneshmand, Francis Bach

Mean field theory has provided theoretical insights into various algorithms by letting the problem size tend to infinity.

Super-Resolution Tensor Decomposition

Rethinking the Variational Interpretation of Accelerated Optimization Methods

no code implementations NeurIPS 2021 Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand

The continuous-time model of Nesterov's momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization.

Batch Normalization Orthogonalizes Representations in Deep Random Networks

1 code implementation NeurIPS 2021 Hadi Daneshmand, Amir Joudaki, Francis Bach

This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network.

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

no code implementations23 Feb 2021 Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith

Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers.

Numerical Integration

Batch normalization provably avoids ranks collapse for randomly initialised deep networks

no code implementations NeurIPS 2020 Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

no code implementations3 Mar 2020 Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Mixing of Stochastic Accelerated Gradient Descent

no code implementations31 Oct 2019 Peiyuan Zhang, Hadi Daneshmand, Thomas Hofmann

We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression.

Stochastic Optimization

Local Saddle Point Optimization: A Curvature Exploitation Approach

1 code implementation15 May 2018 Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems.

Escaping Saddles with Stochastic Gradients

no code implementations ICML 2018 Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions.

Accelerated Dual Learning by Homotopic Initialization

no code implementations13 Jun 2017 Hadi Daneshmand, Hamed Hassani, Thomas Hofmann

Gradient descent and coordinate descent are well understood in terms of their asymptotic behavior, but less so in a transient regime often used for approximations in machine learning.

DynaNewton - Accelerating Newton's Method for Machine Learning

no code implementations20 May 2016 Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

Solutions on this path are tracked such that the minimizer of the previous objective is guaranteed to be within the quadratic convergence region of the next objective to be optimized.

BIG-bench Machine Learning

Starting Small -- Learning with Adaptive Sample Sizes

no code implementations9 Mar 2016 Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.