no code implementations • Findings (EMNLP) 2021 • Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi
Scripts – prototypical event sequences describing everyday activities – have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information.
1 code implementation • 26 Oct 2023 • Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui
Factual probing is a method that uses prompts to test if a language model "knows" certain world knowledge facts.
1 code implementation • 31 May 2023 • Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona J. Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi
In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation.
1 code implementation • 31 Mar 2023 • Jungo Kasai, Yuhei Kasai, Keisuke Sakaguchi, Yutaro Yamada, Dragomir Radev
In this work, we evaluate LLM APIs (ChatGPT, GPT-3, and GPT-4) on the Japanese national medical licensing examinations from the past five years, including the current year.
1 code implementation • 27 Mar 2023 • Michael Regan, Jena D. Hwang, Keisuke Sakaguchi, James Pustejovsky
In this work, we investigate how to apply schema induction models to the task of knowledge discovery for enhanced search of English-language news texts.
no code implementations • 25 Mar 2023 • Steven Coyne, Keisuke Sakaguchi, Diana Galvan-Sosa, Michael Zock, Kentaro Inui
GPT-3 and GPT-4 models are powerful, achieving high performance on a variety of Natural Language Processing tasks.
1 code implementation • 16 Feb 2023 • Yoichi Aoki, Keito Kudo, Tatsuki Kuribayashi, Ana Brassard, Masashi Yoshikawa, Keisuke Sakaguchi, Kentaro Inui
Neural reasoning accuracy improves when generating intermediate reasoning steps.
1 code implementation • 15 Feb 2023 • Keito Kudo, Yoichi Aoki, Tatsuki Kuribayashi, Ana Brassard, Masashi Yoshikawa, Keisuke Sakaguchi, Kentaro Inui
Compositionality is a pivotal property of symbolic reasoning.
no code implementations • 19 Dec 2022 • Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin Choi
Here, we investigate an alternative that a priori seems impossible: can smaller language models (e. g., GPT-2) win over models that are orders of magnitude larger and better (e. g., GPT-3), if powered with novel commonsense distillation algorithms?
1 code implementation • 23 May 2022 • Masato Mita, Keisuke Sakaguchi, Masato Hagiwara, Tomoya Mizumoto, Jun Suzuki, Kentaro Inui
Natural language processing technology has rapidly improved automated grammatical error correction tasks, and the community begins to explore document-level revision as one of the next challenges.
1 code implementation • 19 May 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Hao Peng, Ximing Lu, Dragomir Radev, Yejin Choi, Noah A. Smith
Our extensive evaluations on machine translation and scientific paper summarization demonstrate that Twist decoding substantially outperforms each model decoded in isolation over various scenarios, including cases where domain-specific and general-purpose models are both available.
1 code implementation • 1 May 2022 • Shabnam Behzad, Keisuke Sakaguchi, Nathan Schneider, Amir Zeldes
We present ELQA, a corpus of questions and answers in and about the English language.
1 code implementation • 11 Apr 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Dragomir Radev, Yejin Choi, Noah A. Smith
Text generation with beam search has proven successful in a wide range of applications.
1 code implementation • 15 Dec 2021 • Niket Tandon, Aman Madaan, Peter Clark, Keisuke Sakaguchi, Yiming Yang
We present a new dataset, Interscript, containing user feedback on a deployed model that generates complex everyday tasks.
2 code implementations • NAACL 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. Smith
We therefore propose a generalization of leaderboards, bidimensional leaderboards (Billboards), that simultaneously tracks progress in language generation models and metrics for their evaluation.
2 code implementations • NAACL 2022 • Jungo Kasai, Keisuke Sakaguchi, Lavinia Dunagan, Jacob Morrison, Ronan Le Bras, Yejin Choi, Noah A. Smith
We establish THumB, a rubric-based human evaluation protocol for image captioning models.
1 code implementation • 14 Oct 2021 • Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jenny Liang, Jesse Dodge, Keisuke Sakaguchi, Maxwell Forbes, Jon Borchardt, Saadia Gabriel, Yulia Tsvetkov, Oren Etzioni, Maarten Sap, Regina Rini, Yejin Choi
As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof.
no code implementations • 18 Apr 2021 • Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Yiming Yang, Peter Clark, Keisuke Sakaguchi, Ed Hovy
A class of explainable NLP models for reasoning tasks support their decisions by generating free-form or structured explanations, but what happens when these supporting structures contain errors?
no code implementations • 16 Apr 2021 • Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi
Scripts - standardized event sequences describing typical everyday activities - have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information.
1 code implementation • 7 Apr 2021 • Masato Hagiwara, Joshua Tanner, Keisuke Sakaguchi
We present GrammarTagger, an open-source grammar profiler which, given an input text, identifies grammatical features useful for language education.
no code implementations • EMNLP 2020 • Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy
Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step, where the entity, attribute, and state values must be predicted from an open vocabulary.
3 code implementations • 12 Oct 2020 • Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, Yejin Choi
Next, we show that ATOMIC 2020 is better suited for training knowledge models that can generate accurate, representative knowledge for new, unseen entities and events.
no code implementations • IJCNLP 2019 • T, Niket on, Bhavana Dalvi, Keisuke Sakaguchi, Peter Clark, Antoine Bosselut
We introduce WIQA, the first large-scale dataset of {``}What if...{''} questions over procedural text.
1 code implementation • LREC 2020 • Aaron Steven White, Elias Stengel-Eskin, Siddharth Vashishtha, Venkata Govindarajan, Dee Ann Reisinger, Tim Vieira, Keisuke Sakaguchi, Sheng Zhang, Francis Ferraro, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme
We present the Universal Decompositional Semantics (UDS) dataset (v1. 0), which is bundled with the Decomp toolkit (v0. 1).
1 code implementation • 10 Sep 2019 • Niket Tandon, Bhavana Dalvi Mishra, Keisuke Sakaguchi, Antoine Bosselut, Peter Clark
We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text.
no code implementations • ACL 2020 • Tongfei Chen, Zhengping Jiang, Adam Poliak, Keisuke Sakaguchi, Benjamin Van Durme
We introduce Uncertain Natural Language Inference (UNLI), a refinement of Natural Language Inference (NLI) that shifts away from categorical labels, targeting instead the direct prediction of subjective probability assessments.
2 code implementations • ICLR 2020 • Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
Abductive reasoning is inference to the most plausible explanation.
3 code implementations • 24 Jul 2019 • Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
The key steps of the dataset construction consist of (1) a carefully designed crowdsourcing procedure, followed by (2) systematic bias reduction using a novel AfLite algorithm that generalizes human-detectable word associations to machine-detectable embedding associations.
no code implementations • ACL 2018 • Keisuke Sakaguchi, Benjamin Van Durme
We describe a novel method for efficiently eliciting scalar annotations for dataset construction and system quality estimation by human judgments.
no code implementations • WS 2017 • Keisuke Sakaguchi, Courtney Napoles, Joel Tetreault
The field of grammatical error correction (GEC) has made tremendous bounds in the last ten years, but new questions and obstacles are revealing themselves.
no code implementations • IJCNLP 2017 • Keisuke Sakaguchi, Matt Post, Benjamin Van Durme
We propose a neural encoder-decoder model with reinforcement learning (NRL) for grammatical error correction (GEC).
1 code implementation • ACL 2017 • Keisuke Sakaguchi, Matt Post, Benjamin Van Durme
We propose a new dependency parsing scheme which jointly parses a sentence and repairs grammatical errors by extending the non-directional transition-based formalism of Goldberg and Elhadad (2010) with three additional actions: SUBSTITUTE, DELETE, INSERT.
1 code implementation • EACL 2017 • Courtney Napoles, Keisuke Sakaguchi, Joel Tetreault
We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC).
1 code implementation • EMNLP 2016 • Courtney Napoles, Keisuke Sakaguchi, Joel Tetreault
We show that reference-less grammaticality metrics correlate very strongly with human judgments and are competitive with the leading reference-based evaluation metrics.
1 code implementation • 7 Aug 2016 • Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme
Inspired by the findings from the Cmabrigde Uinervtisy effect, we propose a word recognition model based on a semi-character level recurrent neural network (scRNN).
1 code implementation • 9 May 2016 • Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault
The GLEU metric was proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015).
1 code implementation • TACL 2016 • Keisuke Sakaguchi, Courtney Napoles, Matt Post, Joel Tetreault
The field of grammatical error correction (GEC) has grown substantially in recent years, with research directed at both evaluation metrics and improved system performance against those metrics.