no code implementations • 23 Nov 2023 • ZiHao Wang, Eshaan Nichani, Jason D. Lee
Our main result shows that for a large subclass of degree $k$ polynomials $p$, a three-layer neural network trained via layerwise gradient descent on the square loss learns the target $h$ up to vanishing test error in $\widetilde{\mathcal{O}}(d^k)$ samples and polynomial time.
1 code implementation • 30 Sep 2022 • Alex Damian, Eshaan Nichani, Jason D. Lee
Our analysis provides precise predictions for the loss, sharpness, and deviation from the PGD trajectory throughout training, which we verify both empirically in a number of standard settings and theoretically under mild conditions.
1 code implementation • 8 Jun 2022 • Eshaan Nichani, Yu Bai, Jason D. Lee
Next, we show that a wide two-layer neural network can jointly use the NTK and QuadNTK to fit target functions consisting of a dense low-degree term and a sparse high-degree term -- something neither the NTK nor the QuadNTK can do on their own.
no code implementations • 19 Oct 2020 • Eshaan Nichani, Adityanarayanan Radhakrishnan, Caroline Uhler
We then present a novel linear regression framework for characterizing the impact of depth on test risk, and show that increasing depth leads to a U-shaped test risk for the linear CNTK.
no code implementations • 28 Sep 2020 • Eshaan Nichani, Adityanarayanan Radhakrishnan, Caroline Uhler
Recent work provided an explanation for this phenomenon by introducing the double descent curve, showing that increasing model capacity past the interpolation threshold leads to a decrease in test error.
no code implementations • 13 Mar 2020 • Adityanarayanan Radhakrishnan, Eshaan Nichani, Daniel Bernstein, Caroline Uhler
We define alignment for fully connected networks with multidimensional outputs and show that it is a natural extension of alignment in networks with 1-dimensional outputs as defined by Ji and Telgarsky, 2018.