no code implementations • 27 Aug 2024 • Simran Kaur, Simon Park, Anirudh Goyal, Sanjeev Arora
Vanilla SFT (i. e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2. 0, MT-Bench, and WildBench.
no code implementations • 26 Oct 2023 • Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora
The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.
1 code implementation • 29 Nov 2022 • Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary C. Lipton
A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training.
no code implementations • 21 Jun 2022 • Simran Kaur, Jeremy Cohen, Zachary C. Lipton
The mechanisms by which certain training interventions, such as increasing learning rates and applying batch normalization, improve the generalization of deep networks remains a mystery.
1 code implementation • ICLR 2021 • Jeremy M. Cohen, Simran Kaur, Yuanzhi Li, J. Zico Kolter, Ameet Talwalkar
We empirically demonstrate that full-batch gradient descent on neural network training objectives typically operates in a regime we call the Edge of Stability.
no code implementations • 18 Oct 2019 • Simran Kaur, Jeremy Cohen, Zachary C. Lipton
For a standard convolutional neural network, optimizing over the input pixels to maximize the score of some target class will generally produce a grainy-looking version of the original image.