no code implementations • 18 Oct 2023 • Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau
Additionally, our analysis uncovers the semantic predispositions in LLMs and reveals the impact of recency bias for information presented in long contexts.
no code implementations • 4 Oct 2023 • Satwik Bhattamishra, Arkil Patel, Phil Blunsom, Varun Kanade
In this work, we take a step towards answering these questions by demonstrating the following: (a) On a test-bed with a variety of Boolean function classes, we find that Transformers can nearly match the optimal learning algorithm for 'simpler' tasks, while their performance deteriorates on more 'complex' tasks.
no code implementations • 31 Jul 2023 • Kyle Duffy, Satwik Bhattamishra, Phil Blunsom
Large-scale pre-training has made progress in many fields of natural language processing, though little is understood about the design of pre-training datasets.
no code implementations • 20 Jun 2023 • Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
With the increase in the scale of Deep Learning (DL) training workloads in terms of compute resources and time consumption, the likelihood of encountering in-training failures rises substantially, leading to lost work and resource wastage.
1 code implementation • 22 Nov 2022 • Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom
(ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity.
1 code implementation • ACL 2022 • Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal
Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences.
3 code implementations • NAACL 2021 • Arkil Patel, Satwik Bhattamishra, Navin Goyal
Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs.
Ranked #1 on
Math Word Problem SolvingΩ
on MAWPS
1 code implementation • COLING 2020 • Satwik Bhattamishra, Kabir Ahuja, Navin Goyal
We find that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the test strings are longer.
1 code implementation • EMNLP 2020 • Satwik Bhattamishra, Kabir Ahuja, Navin Goyal
Our analysis also provides insights on the role of self-attention mechanism in modeling certain behaviors and the influence of positional encoding schemes on the learning and generalization abilities of the model.
1 code implementation • CONLL 2020 • Satwik Bhattamishra, Arkil Patel, Navin Goyal
Transformers are being used extensively across several sequence modeling tasks.
no code implementations • ICON 2019 • Pratik Joshi, Christain Barnes, Sebastin Santy, Simran Khanuja, Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury, Kalika Bali
In this paper, we examine and analyze the challenges associated with developing and introducing language technologies to low-resource language communities.
1 code implementation • NAACL 2019 • Ashutosh Kumar, Satwik Bhattamishra, Bh, Manik ari, Partha Talukdar
Inducing diversity in the task of paraphrasing is an important problem in NLP with applications in data augmentation and conversational agents.