1 code implementation • 24 Oct 2023 • Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi

Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models.

1 code implementation • 26 Sep 2023 • Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.

1 code implementation • 11 Sep 2023 • Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1. 3 billion parameter model with Python coding performance close to the state-of-the-art.

Ranked #12 on Question Answering on SIQA

no code implementations • 20 Jun 2023 • Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

Despite this small scale, phi-1 attains pass@1 accuracy 50. 6% on HumanEval and 55. 5% on MBPP.

Ranked #41 on Code Generation on HumanEval

no code implementations • 17 Feb 2023 • Mathieu Even, Scott Pesme, Suriya Gunasekar, Nicolas Flammarion

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks.

no code implementations • 17 Nov 2022 • Ananya Kumar, Ruoqi Shen, Sebastien Bubeck, Suriya Gunasekar

SGD and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision.

1 code implementation • 22 Jul 2022 • Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet

However, existing approaches either require human experts to manually tune each scene property or use automatic methods that provide little to no control; this requires rendering large amounts of random data variations, which is slow and is often suboptimal for the target domain.

no code implementations • 5 Jul 2022 • Suriya Gunasekar

(b) The robustness of performance is improved by even a minimal augmentation of $4$ pixel random crop across all architectures.

1 code implementation • 9 Jun 2022 • Yi Zhang, Arturs Backurs, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Tal Wagner

We study how the trained models eventually succeed at the task, and in particular, we manage to understand some of the attention heads as well as how the information flows in the network.

no code implementations • 3 Mar 2022 • Ruoqi Shen, Sébastien Bubeck, Suriya Gunasekar

In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process.

1 code implementation • 24 Feb 2021 • Meena Jagadeesan, Ilya Razenshteyn, Suriya Gunasekar

We provide a function space characterization of the inductive bias resulting from minimizing the $\ell_2$ norm of the weights in multi-channel convolutional neural networks with linear activations and empirically test our resulting hypothesis on ReLU networks trained using gradient descent.

no code implementations • 14 Dec 2020 • Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur

Understanding generalization in deep learning is arguably one of the most important questions in deep learning.

no code implementations • NeurIPS 2020 • Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks".

no code implementations • 2 Apr 2020 • Suriya Gunasekar, Blake Woodworth, Nathan Srebro

We present a primal only derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential.

1 code implementation • 20 Feb 2020 • Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

no code implementations • NeurIPS 2020 • Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu

For certain stepsizes of g and w , we show that they can converge close to the minimum norm solution.

1 code implementation • 13 Jun 2019 • Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro

A recent line of work studies overparametrized neural networks in the "kernel regime," i. e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution.

no code implementations • 17 May 2019 • Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry

With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.

no code implementations • NeurIPS 2018 • Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, Nathan Srebro

We study the interplay between sequential decision making and avoiding discrimination against protected groups, when examples arrive online and do not follow distributional assumptions.

no code implementations • NeurIPS 2018 • Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain.

no code implementations • 5 Mar 2018 • Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry

We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.

no code implementations • ICML 2018 • Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

We study the implicit bias of generic optimization methods, such as mirror descent, natural gradient descent, and steepest descent with respect to different potentials and norms, when optimizing underdetermined linear regression or separable linear classification problems.

2 code implementations • ICLR 2018 • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.

no code implementations • NeurIPS 2017 • Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$.

no code implementations • 20 Feb 2017 • Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, Nathan Srebro

We consider learning a predictor which is non-discriminatory with respect to a "protected attribute" according to the notion of "equalized odds" proposed by Hardt et al. [2016].

no code implementations • NeurIPS 2016 • Suriya Gunasekar, Oluwasanmi Koyejo, Joydeep Ghosh

We propose a novel and efficient algorithm for the collaborative preference completion problem, which involves jointly estimating individualized rankings for a set of entities over a shared set of items, based on a limited number of observed affinity values.

no code implementations • 2 Aug 2016 • Shalmali Joshi, Suriya Gunasekar, David Sontag, Joydeep Ghosh

This work proposes a new algorithm for automated and simultaneous phenotyping of multiple co-occurring medical conditions, also referred as comorbidities, using clinical notes from the electronic health records (EHRs).

no code implementations • NeurIPS 2015 • Suriya Gunasekar, Arindam Banerjee, Joydeep Ghosh

In this paper, we present a unified analysis of matrix completion under general low-dimensional structural constraints induced by {\em any} norm regularization.

no code implementations • 15 Sep 2015 • Suriya Gunasekar, Pradeep Ravikumar, Joydeep Ghosh

We consider the matrix completion problem of recovering a structured matrix from noisy and partial measurements.

no code implementations • 5 Dec 2014 • Suriya Gunasekar, Makoto Yamada, Dawei Yin, Yi Chang

We address the collective matrix completion problem of jointly recovering a collection of matrices with shared structure from partial (and potentially noisy) observations.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.