no code implementations • FEVER (ACL) 2022 • James Ferguson, Hannaneh Hajishirzi, Pradeep Dasigi, Tushar Khot
Training retrieval models to fetch contexts for Question Answering (QA) over large corpora requires labeling relevant passages in those corpora.
no code implementations • 2 Jul 2024 • Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
Chat-based language models are designed to be helpful, yet they should not comply with every user request.
3 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
2 code implementations • 17 Nov 2023 • Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques.
no code implementations • 16 Nov 2023 • Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep Dasigi
Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability.
1 code implementation • 5 Oct 2023 • Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao Peng
We propose Trust Region Aware Minimization (TRAM), a SAM algorithm fine-tuning for low parameter sharpness and smooth, informative representations preserving pre-trained structure.
2 code implementations • NeurIPS 2023 • Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap.
1 code implementation • 19 May 2023 • Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil, Heng Ji, Hannaneh Hajishirzi
Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e. g., K=100), which are then reranked by a more powerful cross-encoder model.
1 code implementation • 30 Jan 2023 • Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle Lo
Motivated by our survey, we present LongEval, a set of guidelines for human evaluation of faithfulness in long-form summaries that addresses the following challenges: (1) How can we achieve high inter-annotator agreement on faithfulness scores?
1 code implementation • 2 Dec 2022 • Bhargavi Paranjape, Pradeep Dasigi, Vivek Srikumar, Luke Zettlemoyer, Hannaneh Hajishirzi
We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization -- an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them.
1 code implementation • 1 Dec 2022 • Hamish Ivison, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi
Obtaining labeled data to train a model for a task of interest is often expensive.
1 code implementation • ACL 2022 • Yuxiang Wu, Matt Gardner, Pontus Stenetorp, Pradeep Dasigi
We propose to tackle this problem by generating a debiased version of a dataset, which can then be used to train a debiased, off-the-shelf model, by simply replacing its training data.
Ranked #1 on Natural Language Inference on HANS
1 code implementation • NAACL 2021 • Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt Gardner
Readers of academic research papers often read with the goal of answering specific questions.
Ranked #1 on Evidence Selection on QASPER
no code implementations • EMNLP 2021 • Dheeru Dua, Pradeep Dasigi, Sameer Singh, Matt Gardner
When training most modern reading comprehension models, all the questions associated with a context are treated as being independent from each other.
1 code implementation • EMNLP 2021 • Ansong Ni, Matt Gardner, Pradeep Dasigi
We also show that retrieval marginalization results in 4. 1 QA F1 improvement over a non-marginalized baseline on HotpotQA in the fullwiki setting.
no code implementations • EMNLP 2020 • James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot, Pradeep Dasigi
However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information.
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • EMNLP 2020 • Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasovi{\'c}, Zhen Nie
High-quality and large-scale data are key to success for AI systems.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
1 code implementation • IJCNLP 2019 • Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner
Machine comprehension of texts longer than a single sentence often requires coreference resolution.
no code implementations • NAACL 2019 • Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, Eduard Hovy
Training semantic parsers from question-answer pairs typically involves searching over an exponentially large space of logical forms, and an unguided search can easily be misled by spurious logical forms that coincidentally evaluate to the correct answer.
3 code implementations • NAACL 2019 • Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner
We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs.
Ranked #14 on Question Answering on DROP Test
no code implementations • ACL 2018 • Matt Gardner, Pradeep Dasigi, Srinivasan Iyer, Alane Suhr, Luke Zettlemoyer
Semantic parsing, the study of translating natural language utterances into machine-executable programs, is a well-established research area and has applications in question answering, instruction following, voice assistants, and code generation.
1 code implementation • WS 2018 • Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, Luke Zettlemoyer
This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding.
1 code implementation • EMNLP 2017 • Jayant Krishnamurthy, Pradeep Dasigi, Matt Gardner
We present a new semantic parsing model for answering compositional questions on semi-structured Wikipedia tables.
1 code implementation • ACL 2017 • Pradeep Dasigi, Waleed Ammar, Chris Dyer, Eduard Hovy
Type-level word embeddings use the same set of parameters to represent all instances of a word regardless of its context, ignoring the inherent lexical ambiguity in language.
2 code implementations • 17 Feb 2017 • Pradeep Dasigi, Gully A. P. C. Burns, Eduard Hovy, Anita de Waard
We propose a deep learning model for identifying structure within experiment narratives in scientific literature.
no code implementations • LREC 2014 • Mona Diab, Mohamed Al-Badrashiny, Maryam Aminian, Mohammed Attia, Heba Elfardy, Nizar Habash, Abdelati Hawwari, Wael Salloum, Pradeep Dasigi, Esk, Ramy er
Multiple levels of quality checks are performed on the output of each step in the creation process.