no code implementations • 26 Oct 2023 • Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora
The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.
no code implementations • 3 Oct 2023 • Greg Yang, Dingli Yu, Chen Zhu, Soufiane Hayou
By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $\mu$P, for *widthwise hyperparameter transfer*, i. e., predicting optimal hyperparameters of wide neural networks from narrow ones.
1 code implementation • 5 Nov 2022 • Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora
Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net.
1 code implementation • 11 Oct 2022 • Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora
It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings.
no code implementations • 29 Sep 2021 • Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora
Saliency methods seek to provide human-interpretable explanations for the output of machine learning model on a given input.
no code implementations • 3 Nov 2019 • Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora
An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.
4 code implementations • ICLR 2020 • Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu
On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.
no code implementations • ICLR 2020 • Wei Hu, Zhiyuan Li, Dingli Yu
Over-parameterized deep neural networks trained by simple first-order methods are known to be able to fit any labeling of data.