1 code implementation • 19 Feb 2024 • Soufiane Hayou, Nikhil Ghosh, Bin Yu
In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension).
no code implementations • 24 Nov 2023 • James B. Simon, Dhruva Karkada, Nikhil Ghosh, Mikhail Belkin
In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better.
no code implementations • 6 Aug 2023 • Nikhil Ghosh, Spencer Frei, Wooseok Ha, Bin Yu
On the other hand, for any batch size strictly smaller than the number of samples, SGD finds a global minimum which is sparse and nearly orthogonal to its initialization, showing that the randomness of stochastic gradients induces a qualitatively different type of "feature selection" in this setting.
no code implementations • 31 Jan 2023 • Cenk Baykal, Dylan J Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang
One way of introducing sparsity into deep networks is by attaching an external table of parameters that is sparsely looked up at different layers of the network.
no code implementations • 23 Jul 2022 • Nikhil Ghosh, Mikhail Belkin
Remarkably, while the Marchenko-Pastur analysis is far more precise near the interpolation peak, where the number of parameters is just enough to fit the training data, it coincides exactly with the distribution independent bound as the level of overparametrization increases.
1 code implementation • 20 Feb 2022 • Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran
In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.
no code implementations • ICLR 2022 • Nikhil Ghosh, Song Mei, Bin Yu
To understand how deep learning works, it is crucial to understand the training dynamics of neural networks.
no code implementations • NeurIPS 2019 • Nikhil Ghosh, Yuxin Chen, Yisong Yue
In this paper, we aim to learn a low-dimensional Euclidean representation from a set of constraints of the form "item j is closer to item i than item k".