Search Results for author: Nikhil Ghosh

Found 8 papers, 2 papers with code

LoRA+: Efficient Low Rank Adaptation of Large Models

1 code implementation • 19 Feb 2024 • Soufiane Hayou, Nikhil Ghosh, Bin Yu

In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension).

146

Paper
Code

More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

no code implementations • 24 Nov 2023 • James B. Simon, Dhruva Karkada, Nikhil Ghosh, Mikhail Belkin

In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better.

Philosophy regression

Paper
Add Code

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

no code implementations • 6 Aug 2023 • Nikhil Ghosh, Spencer Frei, Wooseok Ha, Bin Yu

On the other hand, for any batch size strictly smaller than the number of samples, SGD finds a global minimum which is sparse and nearly orthogonal to its initialization, showing that the randomness of stochastic gradients induces a qualitatively different type of "feature selection" in this setting.

feature selection

Paper
Add Code

The Power of External Memory in Increasing Predictive Model Capacity

no code implementations • 31 Jan 2023 • Cenk Baykal, Dylan J Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang

One way of introducing sparsity into deep networks is by attaching an external table of parameters that is sparsely looked up at different layers of the network.

Language Modelling

Paper
Add Code

A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

no code implementations • 23 Jul 2022 • Nikhil Ghosh, Mikhail Belkin

Remarkably, while the Marchenko-Pastur analysis is far more precise near the interpolation peak, where the number of parameters is just enough to fit the training data, it coincides exactly with the distribution independent bound as the level of overparametrization increases.

Paper
Add Code

Deconstructing Distributions: A Pointwise Framework of Learning

1 code implementation • 20 Feb 2022 • Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran

In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.

Paper
Code

The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

no code implementations • ICLR 2022 • Nikhil Ghosh, Song Mei, Bin Yu

To understand how deep learning works, it is crucial to understand the training dynamics of neural networks.

Vocal Bursts Intensity Prediction

Paper
Add Code

Landmark Ordinal Embedding

no code implementations • NeurIPS 2019 • Nikhil Ghosh, Yuxin Chen, Yisong Yue

In this paper, we aim to learn a low-dimensional Euclidean representation from a set of constraints of the form "item j is closer to item i than item k".

Computational Efficiency

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.