no code implementations • 21 Oct 2024 • Divyanshu Aggarwal, Sankarshan Damle, Navin Goyal, Satya Lokam, Sunayana Sitaram
A common challenge towards the adaptability of Large Language Models (LLMs) is their ability to learn new languages over time without hampering the model's performance on languages in which the model is already proficient (usually English).
1 code implementation • 27 May 2024 • Xinting Huang, Madhur Panwar, Navin Goyal, Michael Hahn
The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations.
1 code implementation • 25 Apr 2024 • Kabir Ahuja, Vidhisha Balachandran, Madhur Panwar, Tianxing He, Noah A. Smith, Navin Goyal, Yulia Tsvetkov
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias.
1 code implementation • 19 Jun 2023 • Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu K. Lahiri, Sriram K. Rajamani
We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it.
Ranked #1 on Code Completion on DotPrompts
1 code implementation • 8 Jun 2023 • Madhur Panwar, Kabir Ahuja, Navin Goyal
One of the main discoveries in this line of research has been that for several function classes, such as linear regression, transformers successfully generalize to new functions in the class.
no code implementations • 14 Mar 2023 • Michael Hahn, Navin Goyal
Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations.
no code implementations • 14 Nov 2022 • Ayush Agrawal, Siddhartha Gadgil, Navin Goyal, Ashvni Narayanan, Anand Tadipatri
Mathematics formalisation is the task of writing mathematics (i. e., definitions, theorem statements, proofs) in natural language, as found in books and papers, into a formal language that can then be checked for correctness by a program.
1 code implementation • 23 Oct 2022 • Ankur Sikarwar, Arkil Patel, Navin Goyal
On analyzing the task, we find that identifying the target location in the grid world is the main challenge for the models.
1 code implementation • ACL 2022 • Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal
Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences.
1 code implementation • 19 Jun 2021 • Kulin Shah, Amit Deshpande, Navin Goyal
In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and suitable initialization.
no code implementations • NeurIPS 2021 • Abhishek Panigrahi, Navin Goyal
In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions.
no code implementations • 29 Apr 2021 • Vishesh Agarwal, Somak Aditya, Navin Goyal
To understand Transformers' abilities in such tasks in a fine-grained manner, we deviate from traditional end-to-end settings, and explore a step-wise polynomial simplification task.
3 code implementations • NAACL 2021 • Arkil Patel, Satwik Bhattamishra, Navin Goyal
Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs.
Ranked #1 on Math Word Problem SolvingΩ on MAWPS
no code implementations • 1 Jan 2021 • Vishesh Agarwal, Somak Aditya, Navin Goyal
For a polynomial which is not necessarily in this normal form, a sequence of simplification steps is applied to reach the fully simplified (i. e., in the normal form) polynomial.
no code implementations • 1 Jan 2021 • Kulin Shah, Amit Deshpande, Navin Goyal
In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using Stochastic Gradient Descent (SGD).
1 code implementation • COLING 2020 • Satwik Bhattamishra, Kabir Ahuja, Navin Goyal
We find that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the test strings are longer.
1 code implementation • EMNLP 2020 • Satwik Bhattamishra, Kabir Ahuja, Navin Goyal
Our analysis also provides insights on the role of self-attention mechanism in modeling certain behaviors and the influence of positional encoding schemes on the learning and generalization abilities of the model.
no code implementations • 14 Jul 2020 • Karthik Abinav Sankararaman, Anand Louis, Navin Goyal
First, for a large and well-studied class of LSEMs, namely ``bow free'' models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work.
1 code implementation • CONLL 2020 • Satwik Bhattamishra, Arkil Patel, Navin Goyal
Transformers are being used extensively across several sequence modeling tasks.
no code implementations • 21 Oct 2019 • Abhishek Panigrahi, Raghav Somani, Navin Goyal, Praneeth Netrapalli
What enables Stochastic Gradient Descent (SGD) to achieve better generalization than Gradient Descent (GD) in Neural Network training?
no code implementations • ICLR 2020 • Abhishek Panigrahi, Abhishek Shetty, Navin Goyal
In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks.
no code implementations • 17 May 2019 • Raghav Somani, Navin Goyal, Prateek Jain, Praneeth Netrapalli
This paper proposes and demonstrates a surprising pattern in the training of neural networks: there is a one to one relation between the values of any pair of losses (such as cross entropy, mean squared error, 0/1 error etc.)
no code implementations • 16 May 2019 • Karthik Abinav Sankararaman, Anand Louis, Navin Goyal
First we prove that under a sufficient condition, for a certain sub-class of $\LSEM$ that are \emph{bow-free} (Brito and Pearl (2002)), the parameter recovery is stable.
no code implementations • 13 Jul 2018 • Navin Goyal, Abhishek Shetty
NGCA is also related to dimension reduction and to other data analysis problems such as ICA.
no code implementations • ICLR 2018 • Amit Deshpande, Navin Goyal, Sushrut Karmalkar
We show a similar separation between the expressive power of depth-2 and depth-3 sigmoidal neural networks over a large class of input distributions, as long as the weights are polynomially bounded.
no code implementations • ICLR 2018 • Rahul Anand Sharma, Navin Goyal, Monojit Choudhury, Praneeth Netrapalli
This paper explores the simplicity of learned neural networks under various settings: learned on real vs random data, varying size/architecture and using large minibatch size vs small minibatch size.
no code implementations • 22 Feb 2017 • Joseph Anderson, Navin Goyal, Anupama Nandi, Luis Rademacher
Like the current state-of-the-art, the new algorithm is based on the centroid body (a first moment analogue of the covariance matrix).
no code implementations • 2 Sep 2015 • Joseph Anderson, Navin Goyal, Anupama Nandi, Luis Rademacher
Independent component analysis (ICA) is the problem of efficiently recovering a matrix $A \in \mathbb{R}^{n\times n}$ from i. i. d.
no code implementations • 12 Nov 2013 • Joseph Anderson, Mikhail Belkin, Navin Goyal, Luis Rademacher, James Voss
The problem of learning this map can be efficiently solved using some recent results on tensor decompositions and Independent Component Analysis (ICA), thus giving an algorithm for recovering the mixture.
1 code implementation • 25 Jun 2013 • Navin Goyal, Santosh Vempala, Ying Xiao
Fourier PCA is Principal Component Analysis of a matrix obtained from higher order derivatives of the logarithm of the Fourier transform of a distribution. We make this method algorithmic by developing a tensor decomposition method for a pair of tensors sharing the same vectors in rank-$1$ decompositions.
no code implementations • 9 Nov 2012 • Joseph Anderson, Navin Goyal, Luis Rademacher
We also show a direct connection between the problem of learning a simplex and ICA: a simple randomized reduction to ICA from the problem of learning a simplex.
2 code implementations • 15 Sep 2012 • Shipra Agrawal, Navin Goyal
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.