Search Results for author: Boris Hanin

Found 25 papers, 2 papers with code

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

no code implementations4 Mar 2024 Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

We find empirically that across multiple language tasks, surprisingly, Voting Inference Systems' performance first increases but then decreases as a function of the number of LLM calls.

Language Modelling Large Language Model

Principled Architecture-aware Scaling of Hyperparameters

1 code implementation27 Feb 2024 Wuyang Chen, Junru Wu, Zhangyang Wang, Boris Hanin

However, most designs or optimization methods are agnostic to the choice of network structures, and thus largely ignore the impact of neural architectures on hyperparameters.

AutoML

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

no code implementations28 Sep 2023 Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan

We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet.

Les Houches Lectures on Deep Learning at Large & Infinite Width

no code implementations4 Sep 2023 Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks.

Gaussian Processes

Quantitative CLTs in Deep Neural Networks

no code implementations12 Jul 2023 Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati

We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$.

valid

Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations

no code implementations20 Jun 2023 Gage DeZoort, Boris Hanin

We then prove that using residual aggregation operators, obtained by interpolating a fixed aggregation operator with the identity, provably alleviates oversmoothing at initialization.

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

no code implementations13 May 2023 Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.

Bayesian Interpolation with Deep Linear Networks

no code implementations29 Dec 2022 Boris Hanin, Alexander Zlokapa

For any training dataset, network depth, and hidden layer widths, we find non-asymptotic expressions for the predictive posterior and Bayesian model evidence in terms of Meijer-G functions, a class of meromorphic special functions of a single complex variable.

Bayesian Inference Learning Theory +1

Maximal Initial Learning Rates in Deep ReLU Networks

no code implementations14 Dec 2022 Gaurav Iyer, Boris Hanin, David Rolnick

Training a neural network requires choosing a suitable learning rate, which involves a trade-off between speed and effectiveness of convergence.

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

2 code implementations11 May 2022 Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang

Advanced deep neural networks (DNNs), designed by either human or AutoML algorithms, are growing increasingly complex.

Neural Architecture Search

Random Fully Connected Neural Networks as Perturbatively Solvable Hierarchies

no code implementations3 Apr 2022 Boris Hanin

Moreover, we show that network cumulants form a perturbatively solvable hierarchy in powers of $1/n$ in that $k$-th order cumulants in one layer have recursions that depend to leading order in $1/n$ only on $j$-th order cumulants at the previous layer with $j\leq k$.

Ridgeless Interpolation with Shallow ReLU Networks in $1D$ is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions

no code implementations27 Sep 2021 Boris Hanin

If the curvature estimates at $x_i$ and $x_{i+1}$ have different signs, then $z(x;\theta)$ must be linear on $(x_i, x_{i+1})$.

Random Neural Networks in the Infinite Width Limit as Gaussian Processes

no code implementations4 Jul 2021 Boris Hanin

This article gives a new proof that fully connected neural networks with random weights and biases converge to Gaussian processes in the regime where the input dimension, output dimension, and depth are kept fixed, while the hidden layer widths tend to infinity.

Gaussian Processes

The Principles of Deep Learning Theory

no code implementations18 Jun 2021 Daniel A. Roberts, Sho Yaida, Boris Hanin

This book develops an effective theory approach to understanding deep neural networks of practical relevance.

Inductive Bias Learning Theory +1

Deep ReLU Networks Preserve Expected Length

no code implementations ICLR 2022 Boris Hanin, Ryan Jeong, David Rolnick

Assessing the complexity of functions computed by a neural network helps us understand how the network will learn and generalize.

How Data Augmentation affects Optimization for Linear Regression

no code implementations NeurIPS 2021 Boris Hanin, Yi Sun

Our results apply to arbitrary augmentation schemes, revealing complex interactions between learning rates and augmentations even in the convex setting.

Data Augmentation regression +1

Data augmentation as stochastic optimization

no code implementations28 Sep 2020 Boris Hanin, Yi Sun

We present a theoretical framework recasting data augmentation as stochastic optimization for a sequence of time-varying proxy losses.

Data Augmentation regression +2

Finite Depth and Width Corrections to the Neural Tangent Kernel

no code implementations ICLR 2020 Boris Hanin, Mihai Nica

Moreover, we prove that for such deep and wide networks, the NTK has a non-trivial evolution during training by showing that the mean of its first SGD update is also exponential in the ratio of network depth to width.

Deep ReLU Networks Have Surprisingly Few Activation Patterns

no code implementations NeurIPS 2019 Boris Hanin, David Rolnick

The success of deep networks has been attributed in part to their expressivity: per parameter, deep networks can approximate a richer class of functions than shallow networks.

Memorization

Complexity of Linear Regions in Deep Networks

no code implementations25 Jan 2019 Boris Hanin, David Rolnick

It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions.

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

no code implementations14 Dec 2018 Boris Hanin, Mihai Nica

The fluctuations we find can be thought of as a finite temperature correction to the limit in which first the size and then the number of matrices tend to infinity.

How to Start Training: The Effect of Initialization and Architecture

no code implementations NeurIPS 2018 Boris Hanin, David Rolnick

We identify and study two common failure modes for early training in deep ReLU nets.

Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?

no code implementations NeurIPS 2018 Boris Hanin

We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations.

Approximating Continuous Functions by ReLU Nets of Minimal Width

no code implementations31 Oct 2017 Boris Hanin, Mark Sellke

Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well?

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

no code implementations9 Aug 2017 Boris Hanin

Our approach to this paper is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well-suited for representing convex functions.

Cannot find the paper you are looking for? You can Submit a new open access paper.