1 code implementation • 20 Feb 2025 • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Matei Zaharia, James Zou, Ion Stoica
We propose LLMSelector, an efficient framework for model selection in compound systems, which leverages two key empirical insights: (i) end-to-end performance is often monotonic in how well each module performs, with all other modules held fixed, and (ii) per-module performance can be estimated accurately by an LLM.
1 code implementation • 11 Oct 2024 • Noam Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, Boris Hanin
Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences.
no code implementations • 23 Jul 2024 • Jared Quincy Davis, Boris Hanin, Lingjiao Chen, Peter Bailis, Ion Stoica, Matei Zaharia
This work aims to inform future research and practice in the design of compound AI systems.
no code implementations • 26 May 2024 • Boris Hanin, Alexander Zlokapa
In the restricted case of deep linear networks ($\psi=0$) and noisy data, we show a simple data model for which evidence and generalization error are optimal at zero temperature.
no code implementations • 4 Mar 2024 • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou
Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Language Model (LM) calls and aggregate their responses.
1 code implementation • 27 Feb 2024 • Wuyang Chen, Junru Wu, Zhangyang Wang, Boris Hanin
However, most designs or optimization methods are agnostic to the choice of network structures, and thus largely ignore the impact of neural architectures on hyperparameters.
no code implementations • 28 Sep 2023 • Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan
We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet.
no code implementations • 4 Sep 2023 • Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon
These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks.
no code implementations • 12 Jul 2023 • Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati
We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$.
no code implementations • 20 Jun 2023 • Gage DeZoort, Boris Hanin
We then prove that using residual aggregation operators, obtained by interpolating a fixed aggregation operator with the identity, provably alleviates oversmoothing at initialization.
no code implementations • 13 May 2023 • Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar
In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.
no code implementations • 29 Dec 2022 • Boris Hanin, Alexander Zlokapa
For any training dataset, network depth, and hidden layer widths, we find non-asymptotic expressions for the predictive posterior and Bayesian model evidence in terms of Meijer-G functions, a class of meromorphic special functions of a single complex variable.
no code implementations • 14 Dec 2022 • Gaurav Iyer, Boris Hanin, David Rolnick
Training a neural network requires choosing a suitable learning rate, which involves a trade-off between speed and effectiveness of convergence.
2 code implementations • 11 May 2022 • Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang
Advanced deep neural networks (DNNs), designed by either human or AutoML algorithms, are growing increasingly complex.
no code implementations • 3 Apr 2022 • Boris Hanin
Moreover, we show that network cumulants form a perturbatively solvable hierarchy in powers of $1/n$ in that $k$-th order cumulants in one layer have recursions that depend to leading order in $1/n$ only on $j$-th order cumulants at the previous layer with $j\leq k$.
no code implementations • 27 Sep 2021 • Boris Hanin
If the curvature estimates at $x_i$ and $x_{i+1}$ have different signs, then $z(x;\theta)$ must be linear on $(x_i, x_{i+1})$.
no code implementations • 4 Jul 2021 • Boris Hanin
This article gives a new proof that fully connected neural networks with random weights and biases converge to Gaussian processes in the regime where the input dimension, output dimension, and depth are kept fixed, while the hidden layer widths tend to infinity.
no code implementations • 18 Jun 2021 • Daniel A. Roberts, Sho Yaida, Boris Hanin
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
no code implementations • ICLR 2022 • Boris Hanin, Ryan Jeong, David Rolnick
Assessing the complexity of functions computed by a neural network helps us understand how the network will learn and generalize.
no code implementations • NeurIPS 2021 • Boris Hanin, Yi Sun
Our results apply to arbitrary augmentation schemes, revealing complex interactions between learning rates and augmentations even in the convex setting.
no code implementations • 28 Sep 2020 • Boris Hanin, Yi Sun
We present a theoretical framework recasting data augmentation as stochastic optimization for a sequence of time-varying proxy losses.
no code implementations • ICLR 2020 • Boris Hanin, Mihai Nica
Moreover, we prove that for such deep and wide networks, the NTK has a non-trivial evolution during training by showing that the mean of its first SGD update is also exponential in the ratio of network depth to width.
no code implementations • NeurIPS 2019 • Boris Hanin, David Rolnick
The success of deep networks has been attributed in part to their expressivity: per parameter, deep networks can approximate a richer class of functions than shallow networks.
no code implementations • 25 Jan 2019 • Boris Hanin, David Rolnick
It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions.
no code implementations • 14 Dec 2018 • Boris Hanin, Mihai Nica
The fluctuations we find can be thought of as a finite temperature correction to the limit in which first the size and then the number of matrices tend to infinity.
no code implementations • NeurIPS 2018 • Boris Hanin, David Rolnick
We identify and study two common failure modes for early training in deep ReLU nets.
no code implementations • NeurIPS 2018 • Boris Hanin
We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations.
no code implementations • 31 Oct 2017 • Boris Hanin, Mark Sellke
Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well?
no code implementations • 9 Aug 2017 • Boris Hanin
Our approach to this paper is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well-suited for representing convex functions.