no code implementations • 12 Apr 2024 • Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva
A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations.
no code implementations • 16 Nov 2023 • Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik R Narasimhan, Ameet Deshpande
We facilitate systematic evaluation in this new paradigm by introducing GEO-bench, a benchmark of diverse user queries across multiple domains, coupled with sources required to answer these queries.
1 code implementation • 8 Nov 2023 • Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot
Our experiments with ChatGPT-3. 5 show that this bias is ubiquitous - 80% of our personas demonstrate bias; it is significant - some datasets show performance drops of 70%+; and can be especially harmful for certain groups - some personas suffer statistically significant drops on 80%+ of the datasets.
1 code implementation • 6 Nov 2023 • Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan
In this work, we address the shortcomings of quantitative metrics by proposing QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
no code implementations • 9 Oct 2023 • Avijit Thawani, Jay Pujara, Ashwin Kalyan
Despite recent successes in language models, their ability to represent numbers is insufficient.
no code implementations • 31 Aug 2023 • Atharvan Dogra, Deeksha Varshney, Ashwin Kalyan, Ameet Deshpande, Neeraj Kumar
The generation of effective latent representations and their subsequent refinement to incorporate precise information is an essential prerequisite for Vision-Language Understanding (VLU) tasks such as Video Question Answering (VQA).
no code implementations • 7 Aug 2023 • Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme
Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions.
1 code implementation • 24 May 2023 • Ameet Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik Narasimhan
Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding.
no code implementations • 24 May 2023 • Ameet Deshpande, Tanmay Rajpurohit, Karthik Narasimhan, Ashwin Kalyan
With widespread adoption of AI systems, and the push from stakeholders to make it human-like through alignment techniques, human voice, and pictorial avatars, the tendency for users to anthropomorphize it increases significantly.
1 code implementation • 15 May 2023 • Afra Feyza Akyürek, Ekin Akyürek, Aman Madaan, Ashwin Kalyan, Peter Clark, Derry Wijaya, Niket Tandon
Despite their unprecedented success, even the largest language models make mistakes.
no code implementations • 13 May 2023 • Kaushik Roy, Manas Gaur, Misagh Soltani, Vipula Rawte, Ashwin Kalyan, Amit Sheth
LMs augmented with ProKnow guided method generated 89% safer questions in the depression and anxiety domain.
no code implementations • 11 Apr 2023 • Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan
Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service.
1 code implementation • 29 Nov 2022 • Ameet Deshpande, Md Arafat Sultan, Anthony Ferritto, Ashwin Kalyan, Karthik Narasimhan, Avirup Sil
Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger.
1 code implementation • 31 Oct 2022 • Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling.
Ranked #1 on Mathematical Reasoning on Lila (OOD)
2 code implementations • 29 Sep 2022 • Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan
However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.
1 code implementation • 20 Sep 2022 • Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan
We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.
Ranked #5 on Science Question Answering on ScienceQA
no code implementations • ACL 2022 • Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark, Chitta Baral, Ashwin Kalyan
Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems.
1 code implementation • EMNLP 2021 • Ashwin Kalyan, Abhinav Kumar, Arjun Chandrasekaran, Ashish Sabharwal, Peter Clark
FPs are commonly used in quizzes and interviews to bring out and evaluate the creative reasoning abilities of humans.
1 code implementation • 26 Jun 2021 • Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan
This work shows that value-aware model learning, known for its numerous theoretical benefits, is also practically viable for solving challenging continuous control tasks in prevalent model-based reinforcement learning algorithms.
3 code implementations • 10 Jun 2021 • Tal Schuster, Ashwin Kalyan, Oleksandr Polozov, Adam Tauman Kalai
The dataset is comprehensive in that it spans problems of a range of difficulties and domains, ranging from trivial string manipulation problems, to classic programming puzzles (e. g., Tower of Hanoi), to interview/competitive-programming problems (e. g., dynamic programming), to longstanding open problems in algorithms and mathematics (e. g., factoring).
no code implementations • ICML Workshop LifelongML 2020 • Nirbhay Modhe, Harish K Kamath, Dhruv Batra, Ashwin Kalyan
Despite the breakthroughs achieved by Reinforcement Learning (RL) in recent years, RL agents often fail to perform well in unseen environments.
no code implementations • 25 Sep 2019 • Ashwin Kalyan, Oleksandr Polozov, Adam Tauman Kalai
Puzzles are objective in that one can easily test the correctness of a given solution x by seeing whether it satisfies f, unlike the most common representations for program synthesis: given input-output pairs or an English problem description, the correctness of a given solution is not determined and is debatable.
no code implementations • ICML 2018 • Ashwin Kalyan, Stefan Lee, Anitha Kannan, Dhruv Batra
Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being correct for an input - e. g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e. g. all English sentences).
no code implementations • ICLR 2018 • Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, Sumit Gulwani
In this work, we propose Neural Guided Deductive Search (NGDS), a hybrid synthesis technique that combines the best of both symbolic logic techniques and statistical models.