Search Results for author: Boris Hanin

Found 25 papers, 2 papers with code

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

no code implementations • 4 Mar 2024 • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

We find empirically that across multiple language tasks, surprisingly, Voting Inference Systems' performance first increases but then decreases as a function of the number of LLM calls.

Language Modelling Large Language Model

Paper
Add Code

Principled Architecture-aware Scaling of Hyperparameters

1 code implementation • 27 Feb 2024 • Wuyang Chen, Junru Wu, Zhangyang Wang, Boris Hanin

However, most designs or optimization methods are agnostic to the choice of network structures, and thus largely ignore the impact of neural architectures on hyperparameters.

AutoML

Paper
Code

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

no code implementations • 28 Sep 2023 • Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan

We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet.

Paper
Add Code

Les Houches Lectures on Deep Learning at Large & Infinite Width

no code implementations • 4 Sep 2023 • Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks.

Gaussian Processes

Paper
Add Code

Quantitative CLTs in Deep Neural Networks

no code implementations • 12 Jul 2023 • Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati

We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$.

valid

Paper
Add Code

Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations

no code implementations • 20 Jun 2023 • Gage DeZoort, Boris Hanin

We then prove that using residual aggregation operators, obtained by interpolating a fixed aggregation operator with the identity, provably alleviates oversmoothing at initialization.

Paper
Add Code

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

no code implementations • 13 May 2023 • Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.

Paper
Add Code

Bayesian Interpolation with Deep Linear Networks

no code implementations • 29 Dec 2022 • Boris Hanin, Alexander Zlokapa

For any training dataset, network depth, and hidden layer widths, we find non-asymptotic expressions for the predictive posterior and Bayesian model evidence in terms of Meijer-G functions, a class of meromorphic special functions of a single complex variable.

Bayesian Inference Learning Theory +1

Paper
Add Code

Maximal Initial Learning Rates in Deep ReLU Networks

no code implementations • 14 Dec 2022 • Gaurav Iyer, Boris Hanin, David Rolnick

Training a neural network requires choosing a suitable learning rate, which involves a trade-off between speed and effectiveness of convergence.

Paper
Add Code

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

2 code implementations • 11 May 2022 • Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang

Advanced deep neural networks (DNNs), designed by either human or AutoML algorithms, are growing increasingly complex.

Neural Architecture Search

Paper
Code

Random Fully Connected Neural Networks as Perturbatively Solvable Hierarchies

no code implementations • 3 Apr 2022 • Boris Hanin

Moreover, we show that network cumulants form a perturbatively solvable hierarchy in powers of $1/n$ in that $k$-th order cumulants in one layer have recursions that depend to leading order in $1/n$ only on $j$-th order cumulants at the previous layer with $j\leq k$.

Paper
Add Code

Ridgeless Interpolation with Shallow ReLU Networks in $1D$ is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions

no code implementations • 27 Sep 2021 • Boris Hanin

If the curvature estimates at $x_i$ and $x_{i+1}$ have different signs, then $z(x;\theta)$ must be linear on $(x_i, x_{i+1})$.

Paper
Add Code

Random Neural Networks in the Infinite Width Limit as Gaussian Processes

no code implementations • 4 Jul 2021 • Boris Hanin

This article gives a new proof that fully connected neural networks with random weights and biases converge to Gaussian processes in the regime where the input dimension, output dimension, and depth are kept fixed, while the hidden layer widths tend to infinity.

Gaussian Processes

Paper
Add Code

The Principles of Deep Learning Theory

no code implementations • 18 Jun 2021 • Daniel A. Roberts, Sho Yaida, Boris Hanin

This book develops an effective theory approach to understanding deep neural networks of practical relevance.

Inductive Bias Learning Theory +1

Paper
Add Code

Deep ReLU Networks Preserve Expected Length

no code implementations • ICLR 2022 • Boris Hanin, Ryan Jeong, David Rolnick

Assessing the complexity of functions computed by a neural network helps us understand how the network will learn and generalize.

Paper
Add Code

How Data Augmentation affects Optimization for Linear Regression

no code implementations • NeurIPS 2021 • Boris Hanin, Yi Sun

Our results apply to arbitrary augmentation schemes, revealing complex interactions between learning rates and augmentations even in the convex setting.

Data Augmentation regression +1

Paper
Add Code

Data augmentation as stochastic optimization

no code implementations • 28 Sep 2020 • Boris Hanin, Yi Sun

We present a theoretical framework recasting data augmentation as stochastic optimization for a sequence of time-varying proxy losses.

Data Augmentation regression +2

Paper
Add Code

Finite Depth and Width Corrections to the Neural Tangent Kernel

no code implementations • ICLR 2020 • Boris Hanin, Mihai Nica

Moreover, we prove that for such deep and wide networks, the NTK has a non-trivial evolution during training by showing that the mean of its first SGD update is also exponential in the ratio of network depth to width.

Paper
Add Code

Deep ReLU Networks Have Surprisingly Few Activation Patterns

no code implementations • NeurIPS 2019 • Boris Hanin, David Rolnick

The success of deep networks has been attributed in part to their expressivity: per parameter, deep networks can approximate a richer class of functions than shallow networks.

Memorization

Paper
Add Code

Complexity of Linear Regions in Deep Networks

no code implementations • 25 Jan 2019 • Boris Hanin, David Rolnick

It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions.

Paper
Add Code

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

no code implementations • 14 Dec 2018 • Boris Hanin, Mihai Nica

The fluctuations we find can be thought of as a finite temperature correction to the limit in which first the size and then the number of matrices tend to infinity.

Paper
Add Code

How to Start Training: The Effect of Initialization and Architecture

no code implementations • NeurIPS 2018 • Boris Hanin, David Rolnick

We identify and study two common failure modes for early training in deep ReLU nets.

Paper
Add Code

Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?

no code implementations • NeurIPS 2018 • Boris Hanin

We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations.

Paper
Add Code

Approximating Continuous Functions by ReLU Nets of Minimal Width

no code implementations • 31 Oct 2017 • Boris Hanin, Mark Sellke

Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well?

Paper
Add Code

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

no code implementations • 9 Aug 2017 • Boris Hanin

Our approach to this paper is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well-suited for representing convex functions.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.