Search Results for author: Daniel Khashabi

Found 69 papers, 39 papers with code

Reframing Instructional Prompts to GPTk’s Language

no code implementations • Findings (ACL) 2022 • Daniel Khashabi, Chitta Baral, Yejin Choi, Hannaneh Hajishirzi

Our experiments compare the zero-shot and few-shot performance of LMs prompted with reframed instructions on 12 NLP tasks across 6 categories.

Paper
Add Code

Findings of the 2021 Conference on Machine Translation (WMT21)

no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri

This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.

Machine Translation Translation

Paper
Add Code

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

no code implementations • 5 Apr 2024 • Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi

To address these limitations, we tackle the verifiability goal with a different philosophy: we trivialize the verification process by developing models that quote verbatim statements from trusted sources in pre-training data.

Philosophy

Paper
Add Code

SELF-[IN]CORRECT: LLMs Struggle with Refining Self-Generated Responses

no code implementations • 4 Apr 2024 • Dongwei Jiang, Jingyu Zhang, Orion Weller, Nathaniel Weir, Benjamin Van Durme, Daniel Khashabi

Can LLMs continually improve their previous outputs for better results?

Paper
Add Code

Dated Data: Tracing Knowledge Cutoffs in Large Language Models

no code implementations • 19 Mar 2024 • Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme

Using this analysis, we find that effective cutoffs often differ from reported cutoffs.

Paper
Add Code

Tur[k]ingBench: A Challenge Benchmark for Web Agents

no code implementations • 18 Mar 2024 • Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jack Zhang, Benjamin Van Durme, Daniel Khashabi

We evaluate the performance of state-of-the-art models, including language-only, vision-only, and layout-only models, and their combinations, on this benchmark.

Paper
Add Code

RORA: Robust Free-Text Rationale Evaluation

no code implementations • 28 Feb 2024 • Zhengping Jiang, Yining Lu, Hanjie Chen, Daniel Khashabi, Benjamin Van Durme, Anqi Liu

This is achieved by assessing the conditional V-information \citep{hewitt-etal-2021-conditional} with a predictive family robust against leaky features that can be exploited by a small model.

Decision Making

Paper
Add Code

AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies

1 code implementation • 19 Feb 2024 • Xiao Ye, Andrew Wang, Jacob Choi, Yining Lu, Shreya Sharma, Lingfeng Shen, Vijay Tiyyala, Nicholas Andrews, Daniel Khashabi

Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios.

Benchmarking

Paper
Code

k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text

1 code implementation • 17 Feb 2024 • Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He

Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection.

Text Detection Text Generation

Paper
Code

The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts

no code implementations • 23 Jan 2024 • Lingfeng Shen, Weiting Tan, Sihao Chen, Yunmo Chen, Jingyu Zhang, Haoran Xu, Boyuan Zheng, Philipp Koehn, Daniel Khashabi

As the influence of large language models (LLMs) spans across global communities, their safety challenges in multilingual settings become paramount for alignment research.

Paper
Add Code

Revisiting the Hypothesis: Do pretrained Transformers Learn In-Context by Gradient Descent?

no code implementations • 12 Oct 2023 • Lingfeng Shen, Aayush Mishra, Daniel Khashabi

We observe that ICL and GD have different sensitivity to the order in which they observe demonstrations.

In-Context Learning

Paper
Add Code

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

1 code implementation • 6 Oct 2023 • Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov

Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design.

Sentence Text Generation

Paper
Code

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

no code implementations • 2 Oct 2023 • Tianjian Li, Haoran Xu, Philipp Koehn, Daniel Khashabi, Kenton Murray

Text generation models are notoriously vulnerable to errors in the training data.

Language Modelling Machine Translation +3

Paper
Add Code

The Trickle-down Impact of Reward (In-)consistency on RLHF

1 code implementation • 28 Sep 2023 • Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng Jin, Baolin Peng, Haitao Mi, Daniel Khashabi, Dong Yu

We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM.

Benchmarking

Paper
Code

GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution

1 code implementation • 17 Jul 2023 • Yining Lu, Haoping Yu, Daniel Khashabi

GEAR achieves better efficiency by delegating tool grounding and execution to small language models (SLM) and LLM, respectively; while leveraging semantic and pattern-based evaluation at both question and answer levels for generalizable tool grounding.

Paper
Code

"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data

no code implementations • 22 May 2023 • Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme

Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data.

Paper
Add Code

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

1 code implementation • 18 May 2023 • Lingfeng Shen, Weiting Tan, Boyuan Zheng, Daniel Khashabi

We provide theoretical foundations for this metric and its relationship with other prompt selection metrics, providing a comprehensive understanding of existing methods.

Paper
Code

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

1 code implementation • 20 Dec 2022 • Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi

Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge.

Knowledge Probing Memorization +2

143

Paper
Code

Self-Instruct: Aligning Language Models with Self-Generated Instructions

16 code implementations • 20 Dec 2022 • Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi

Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations.

Instruction Following Language Modelling

28,867

Paper
Code

Generating Sequences by Learning to Self-Correct

no code implementations • 31 Oct 2022 • Sean Welleck, Ximing Lu, Peter West, Faeze Brahman, Tianxiao Shen, Daniel Khashabi, Yejin Choi

Sequence generation applications require satisfying semantic constraints, such as ensuring that programs are correct, using certain keywords, or avoiding undesirable content.

Language Modelling Program Synthesis

Paper
Add Code

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

1 code implementation • 18 Oct 2022 • Nikil Roashan Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, Kai-Wei Chang

How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model?

Language Modelling

Paper
Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

2,669

Paper
Code

ProsocialDialog: A Prosocial Backbone for Conversational Agents

1 code implementation • 25 May 2022 • Hyunwoo Kim, Youngjae Yu, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi, Maarten Sap

With this dataset, we introduce a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost.

Ranked #1 on Dialogue Safety Prediction on ProsocialDialog

Dialogue Generation Dialogue Safety Prediction +2

Paper
Code

Representation Projection Invariance Mitigates Representation Collapse

no code implementations • 23 May 2022 • Anastasia Razdaibiedina, Ashish Khetan, Zohar Karnin, Daniel Khashabi, Vishaal Kapoor, Vivek Madan

In this paper, we propose Representation Projection Invariance (REPINA), a novel regularization method to maintain the information content of representation and reduce representation collapse during fine-tuning by discouraging undesirable changes in the representations.

Paper
Add Code

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

7 code implementations • 16 Apr 2022 • Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, Siddhartha Mishra, Sujan Reddy, Sumanta Patro, Tanay Dixit, Xudong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi

This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones.

Benchmarking Instruction Following

903

Paper
Code

COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics

2 code implementations • 23 Feb 2022 • Lianhui Qin, Sean Welleck, Daniel Khashabi, Yejin Choi

Many applications of text generation require incorporating different constraints to control the semantics or style of generated text.

counterfactual Counterfactual Reasoning +1

Paper
Code

UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training

1 code implementation • 23 Feb 2022 • Daniel Khashabi, Yeganeh Kordi, Hannaneh Hajishirzi

We present UnifiedQA-v2, a QA model built with the same process as UnifiedQA, except that it utilizes more supervision -- roughly 3x the number of datasets used for UnifiedQA.

Question Answering

427

Paper
Code

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

1 code implementation • NAACL 2022 • Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin Choi

To enable constrained generation, we build on NeuroLogic decoding (Lu et al., 2021), combining its flexibility in incorporating logical constraints with A*esque estimates of future constraint satisfaction.

Ranked #1 on Text Generation on ROCStories

Machine Translation Table-to-Text Generation

Paper
Code

Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

1 code implementation • NAACL 2022 • Daniel Khashabi, Shane Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal, Sameer Singh, Yejin Choi

Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning.

Paper
Code

Time Waits for No One! Analysis and Challenges of Temporal Misalignment

1 code implementation • NAACL 2022 • Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, Noah A. Smith

When an NLP model is trained on text data from one time period and tested or deployed on data from another, the resulting temporal misalignment can degrade end-task performance.

Paper
Code

Hey AI, Can You Solve Complex Tasks by Talking to Agents?

1 code implementation • Findings (ACL) 2022 • Tushar Khot, Kyle Richardson, Daniel Khashabi, Ashish Sabharwal

To help develop models that can leverage existing systems, we propose a new challenge: Learning to solve complex tasks by communicating with existing agents (or models) in natural language.

Paper
Code

Reframing Instructional Prompts to GPTk's Language

no code implementations • 16 Sep 2021 • Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, Hannaneh Hajishirzi

Our experiments compare the zero-shot and few-shot performance of LMs prompted with reframed instructions on 12 NLP tasks across 6 categories.

Few-Shot Learning Question Generation +1

Paper
Add Code

Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

1 code implementation • Findings (ACL) 2021 • Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Kai-Wei Chang

We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes.

Ethics Few-Shot Learning +2

Paper
Code

GooAQ: Open Question Answering with Diverse Answer Types

1 code implementation • Findings (EMNLP) 2021 • Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh Hajishirzi, Chris Callison-Burch

GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results.

Open-Ended Question Answering

122

Paper
Code

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

3 code implementations • ACL 2022 • Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hannaneh Hajishirzi

Using this meta-dataset, we measure cross-task generalization by training models on seen tasks and measuring generalization to the remaining unseen ones.

Question Answering

903

Paper
Code

Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

no code implementations • 5 Feb 2021 • Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark

We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset.

Multiple-choice Natural Questions +2

Paper
Add Code

GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation

2 code implementations • 17 Jan 2021 • Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A. Smith, Daniel S. Weld

While often assumed a gold standard, effective human evaluation of text generation remains an important, open area for research.

Machine Translation Reading Comprehension +2

Paper
Code

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

1 code implementation • 6 Jan 2021 • Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan Berant

A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly.

Question Answering StrategyQA

Paper
Code

ParsiNLU: A Suite of Language Understanding Challenges for Persian

1 code implementation • 11 Dec 2020 • Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, Yadollah Yaghoobzadeh

Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English.

Machine Translation Natural Language Inference +4

130

Paper
Code

UnQovering Stereotyping Biases via Underspecified Questions

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tao Li, Tushar Khot, Daniel Khashabi, Ashish Sabharwal, Vivek Srikumar

Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have higher bias; and (3) the effect of fine-tuning on bias varies strongly with the dataset and the model size.

Question Answering

Paper
Code

Evaluating NLP Models via Contrast Sets

no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou

Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.

Reading Comprehension Sentiment Analysis

Paper
Add Code

Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models

1 code implementation • NAACL 2021 • Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish Sabharwal

We propose a general framework called Text Modular Networks(TMNs) for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.

Question Answering

Paper
Code

Temporal Common Sense Acquisition with Minimal Supervision

no code implementations • ACL 2020 • Ben Zhou, Qiang Ning, Daniel Khashabi, Dan Roth

Temporal common sense (e. g., duration and frequency of events) is crucial for understanding natural language.

Common Sense Reasoning Language Modelling

Paper
Add Code

UnifiedQA: Crossing Format Boundaries With a Single QA System

2 code implementations • Findings of the Association for Computational Linguistics 2020 • Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh Hajishirzi

As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats.

Ranked #5 on Common Sense Reasoning on WinoGrande

Common Sense Reasoning Language Modelling +3

427

Paper
Code

TransOMCS: From Linguistic Graphs to Commonsense Knowledge

1 code implementation • 1 May 2020 • Hongming Zhang, Daniel Khashabi, Yangqiu Song, Dan Roth

Commonsense knowledge acquisition is a key problem for artificial intelligence.

Paper
Code

More Bang for Your Buck: Natural Perturbation for Robust Question Answering

no code implementations • EMNLP 2020 • Daniel Khashabi, Tushar Khot, Ashish Sabharwal

While recent models have achieved human-level scores on many NLP datasets, we observe that they are considerably sensitive to small changes in input.

Question Answering

Paper
Add Code

Evaluating Models' Local Decision Boundaries via Contrast Sets

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou

Reading Comprehension Sentiment Analysis

Paper
Code

Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess Hypotheses

1 code implementation • ACL 2020 • Erfan Sadeqi Azer, Daniel Khashabi, Ashish Sabharwal, Dan Roth

Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known issues.

Bayesian Inference Misconceptions

Paper
Code

``Going on a vacation'' takes longer than ``Going for a walk'': A Study of Temporal Commonsense Understanding

no code implementations • IJCNLP 2019 • Ben Zhou, Daniel Khashabi, Qiang Ning, Dan Roth

Understanding time is crucial for understanding events expressed in natural language.

Paper
Add Code

"Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding

1 code implementation • 6 Sep 2019 • Ben Zhou, Daniel Khashabi, Qiang Ning, Dan Roth

Understanding time is crucial for understanding events expressed in natural language.

Paper
Code

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

no code implementations • 4 Sep 2019 • Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz

This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions.

Multiple-choice Question Answering

Paper
Add Code

Reasoning-Driven Question-Answering for Natural Language Understanding

no code implementations • 14 Aug 2019 • Daniel Khashabi

In the second part, we propose two new challenge datasets.

Common Sense Reasoning Natural Language Inference +3

Paper
Add Code

Solving Hard Coreference Problems

no code implementations • HLT 2015 • Haoruo Peng, Daniel Khashabi, Dan Roth

Coreference resolution is a key problem in natural language understanding that still escapes reliable solutions.

coreference-resolution Decision Making +1

Paper
Add Code

Zero-Shot Open Entity Typing as Type-Compatible Grounding

1 code implementation • EMNLP 2018 • Ben Zhou, Daniel Khashabi, Chen-Tse Tsai, Dan Roth

We evaluate our system on a broad range of datasets, including standard fine-grained and coarse-grained entity typing datasets, and also a dataset in the biological domain.

Entity Typing NER +1

Paper
Code

PerspectroScope: A Window to the World of Diverse Perspectives

1 code implementation • ACL 2019 • Sihao Chen, Daniel Khashabi, Chris Callison-Burch, Dan Roth

This work presents PerspectroScope, a web-based system which lets users query a discussion-worthy natural language claim, and extract and visualize various perspectives in support or against the claim, along with evidence supporting each perspective.

Natural Language Inference Natural Language Understanding +1

Paper
Code

Question Answering as Global Reasoning over Semantic Abstractions

1 code implementation • 9 Jun 2019 • Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Dan Roth

We propose a novel method for exploiting the semantic structure of text to answer multiple-choice questions.

Information Retrieval Multiple-choice +2

Paper
Code

Seeing Things from a Different Angle: Discovering Diverse Perspectives about Claims

1 code implementation • 8 Jun 2019 • Sihao Chen, Daniel Khashabi, Wenpeng Yin, Chris Callison-Burch, Dan Roth

Inherently, this is a natural language understanding task, and we propose to address it as such.

8k Fact Checking +1

Paper
Code

Seeing Things from a Different Angle:Discovering Diverse Perspectives about Claims

1 code implementation • NAACL 2019 • Sihao Chen, Daniel Khashabi, Wenpeng Yin, Chris Callison-Burch, Dan Roth

Inherently, this is a natural language understanding task, and we propose to address it as such.

8k Fact Checking +1

Paper
Code

On the Possibilities and Limitations of Multi-hop Reasoning Under Linguistic Imperfections

no code implementations • 8 Jan 2019 • Daniel Khashabi, Erfan Sadeqi Azer, Tushar Khot, Ashish Sabharwal, Dan Roth

The idea is to consider two interrelated spaces: a conceptual meaning space that is unambiguous and complete but hidden, and a linguistic space that captures a noisy grounding of the meaning space in the words of a language---the level at which all systems, whether neural or symbolic, operate.

Paper
Add Code

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

no code implementations • NAACL 2018 • Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, Dan Roth

We present a reading comprehension challenge in which questions can only be answered by taking into account information from multiple sentences.

Natural Language Inference Question Answering +2

Paper
Add Code

CogCompNLP: Your Swiss Army Knife for NLP

1 code implementation • LREC 2018 • Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling, Dan Roth

Semantic Role Labeling

469

Paper
Code

Learning What is Essential in Questions

1 code implementation • CONLL 2017 • Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Dan Roth

Question answering (QA) systems are easily distracted by irrelevant or redundant words in questions, especially when faced with long or multi-sentence questions in difficult domains.

Information Retrieval Question Answering +2

Paper
Code

Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks

no code implementations • 25 Jul 2017 • Parisa Kordjamshidi, Sameer Singh, Daniel Khashabi, Christos Christodoulopoulos, Mark Summons, Saurabh Sinha, Dan Roth

In particular, we provide an initial prototype for a relational and graph traversal query language where queries are directly used as relational features for structured machine learning models.

BIG-bench Machine Learning Knowledge Graphs +1

Paper
Add Code

Better call Saul: Flexible Programming for Learning and Inference in NLP

1 code implementation • COLING 2016 • Parisa Kordjamshidi, Daniel Khashabi, Christos Christodoulopoulos, Bhargav Mangipudi, Sameer Singh, Dan Roth

We present a novel way for designing complex joint inference and learning models using Saul (Kordjamshidi et al., 2015), a recently-introduced declarative learning-based programming language (DeLBP).

Part-Of-Speech Tagging Probabilistic Programming +1

Paper
Code

Adversarial Delays in Online Strongly-Convex Optimization

no code implementations • 20 May 2016 • Daniel Khashabi, Kent Quanrud, Amirhossein Taghvaei

We consider the problem of strongly-convex online optimization in presence of adversarial delays; in a T-iteration online game, the feedback of the player's query at time t is arbitrarily delayed by an adversary for d_t rounds and delivered before the game ends, at iteration t+d_t-1.

Paper
Add Code

EDISON: Feature Extraction for NLP, Simplified

no code implementations • LREC 2016 • Mark Sammons, Christos Christodoulopoulos, Parisa Kordjamshidi, Daniel Khashabi, Vivek Srikumar, Dan Roth

We present EDISON, a Java library of feature generation functions used in a suite of state-of-the-art NLP tools, based on a set of generic NLP data structures.

Paper
Add Code

Question Answering via Integer Programming over Semi-Structured Knowledge

no code implementations • 20 Apr 2016 • Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, Dan Roth

We propose a structured inference system for this task, formulated as an Integer Linear Program (ILP), that answers natural language questions using a semi-structured knowledge base derived from text, including questions requiring multi-step inference and a combination of multiple facts.

Information Retrieval Question Answering +1

Paper
Add Code

Online Learning with Adversarial Delays

no code implementations • NeurIPS 2015 • Kent Quanrud, Daniel Khashabi

We study the performance of standard online learning algorithms when the feedback is delayed by an adversary.

Paper
Add Code

Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm

no code implementations • 25 Aug 2015 • Daniel Khashabi, John Wieting, Jeffrey Yufei Liu, Feng Liang

Empirical studies have been carried out to compare our work with many constrained clustering algorithms from the literature on both a variety of data sets and under a variety of conditions such as using noisy side information and erroneous k values.

Constrained Clustering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.