no code implementations • ACL 2022 • Iz Beltagy, Arman Cohan, Robert Logan IV, Sewon Min, Sameer Singh
The ability to efficiently learn from little-to-no data is critical to applying NLP to tasks where data collection is costly or otherwise difficult.
2 code implementations • 3 Sep 2024 • Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE).
1 code implementation • 9 Jul 2024 • Tong Chen, Akari Asai, Niloofar Mireshghallah, Sewon Min, James Grimmelmann, Yejin Choi, Hannaneh Hajishirzi, Luke Zettlemoyer, Pang Wei Koh
Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities.
1 code implementation • 9 Jul 2024 • Rulin Shao, Jacqueline He, Akari Asai, Weijia Shi, Tim Dettmers, Sewon Min, Luke Zettlemoyer, Pang Wei Koh
Specifically, we find that increasing the size of the datastore used by a retrieval-based LM monotonically improves language modeling and several downstream tasks without obvious saturation, such that a smaller model augmented with a large datastore outperforms a larger LM-only model on knowledge-intensive tasks.
1 code implementation • 12 Feb 2024 • Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh Hajishirzi
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data.
1 code implementation • 30 Jan 2024 • Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi
Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $\infty$-gram LM with backoff.
1 code implementation • 16 Oct 2023 • Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Gergely Szilvasy, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis
Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion.
1 code implementation • 2 Oct 2023 • Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi
Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks.
1 code implementation • 8 Aug 2023 • Sewon Min, Suchin Gururangan, Eric Wallace, Weijia Shi, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer
SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e. g., containing copyrighted books or news) that is only queried during inference.
4 code implementations • 23 May 2023 • Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi
Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly.
2 code implementations • 30 Jan 2023 • Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model.
Ranked #15 on
Question Answering
on Natural Questions
2 code implementations • 20 Dec 2022 • Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs).
2 code implementations • 19 Dec 2022 • Xinxi Lyu, Sewon Min, Iz Beltagy, Luke Zettlemoyer, Hannaneh Hajishirzi
Although large language models can be prompted for both zero- and few-shot learning, performance drops significantly when no demonstrations are available.
1 code implementation • 2 Dec 2022 • Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer
Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases.
1 code implementation • 30 Nov 2022 • Xinyan Velocity Yu, Sewon Min, Luke Zettlemoyer, Hannaneh Hajishirzi
We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.
no code implementations • 22 Oct 2022 • Anas Awadalla, Mitchell Wortsman, Gabriel Ilharco, Sewon Min, Ian Magnusson, Hannaneh Hajishirzi, Ludwig Schmidt
We conduct a large empirical evaluation to investigate the landscape of distributional robustness in question answering.
1 code implementation • 7 Oct 2022 • Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis
We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems.
Ranked #2 on
Question Answering
on FEVER
1 code implementation • 2 Jul 2022 • Zeqiu Wu, Ryu Parish, Hao Cheng, Sewon Min, Prithviraj Ammanabrolu, Mari Ostendorf, Hannaneh Hajishirzi
In an information-seeking conversation, a user may ask questions that are under-specified or unanswerable.
1 code implementation • 25 May 2022 • Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber
Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions.
2 code implementations • 25 Feb 2022 • Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.
1 code implementation • NAACL 2022 • Daniel Khashabi, Shane Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal, Sameer Singh, Yejin Choi
Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning.
2 code implementations • NAACL 2022 • Sewon Min, Mike Lewis, Luke Zettlemoyer, Hannaneh Hajishirzi
We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learning on a large set of training tasks.
1 code implementation • ACL 2022 • Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
We introduce a noisy channel approach for language model prompting in few-shot text classification.
2 code implementations • ACL 2022 • Jungsoo Park, Sewon Min, Jaewoo Kang, Luke Zettlemoyer, Hannaneh Hajishirzi
Claims in FAVIQ are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.
1 code implementation • NAACL 2021 • Iz Beltagy, Arman Cohan, Hannaneh Hajishirzi, Sewon Min, Matthew E. Peters
In this tutorial, we aim at bringing interested NLP researchers up to speed about the recent and ongoing techniques for document-level representation learning.
no code implementations • NAACL 2021 • Srinivasan Iyer, Sewon Min, Yashar Mehdad, Wen-tau Yih
State-of-the-art Machine Reading Comprehension (MRC) models for Open-domain Question Answering (QA) are typically trained for span selection using distantly supervised positive examples and heuristically retrieved negative examples.
no code implementations • EMNLP 2021 • Sewon Min, Kenton Lee, Ming-Wei Chang, Kristina Toutanova, Hannaneh Hajishirzi
We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a given question.
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
1 code implementation • 21 Oct 2020 • Srinivasan Iyer, Sewon Min, Yashar Mehdad, Wen-tau Yih
State-of-the-art Machine Reading Comprehension (MRC) models for Open-domain Question Answering (QA) are typically trained for span selection using distantly supervised positive examples and heuristically retrieved negative examples.
3 code implementations • EMNLP 2020 • Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, Wen-tau Yih
We present ELQ, a fast end-to-end entity linking model for questions, which uses a biencoder to jointly perform mention detection and linking in one pass.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh Hajishirzi
As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats.
Ranked #5 on
Common Sense Reasoning
on WinoGrande
2 code implementations • EMNLP 2020 • Sewon Min, Julian Michael, Hannaneh Hajishirzi, Luke Zettlemoyer
Ambiguity is inherent to open-domain question answering; especially when exploring new topics, it can be difficult to ask questions that have a single, unambiguous answer.
18 code implementations • EMNLP 2020 • Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih
Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method.
Ranked #1 on
Question Answering
on NaturalQA
7 code implementations • 10 Nov 2019 • Sewon Min, Danqi Chen, Luke Zettlemoyer, Hannaneh Hajishirzi
We introduce an approach for open-domain question answering (QA) that retrieves and reads a passage graph, where vertices are passages of text and edges represent relationships that are derived from an external knowledge base or co-occurrence in the same article.
no code implementations • WS 2019 • Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min
In this work, we justify a question answering approach to reading comprehension and describe the various kinds of questions one might use to more fully test a system{'}s comprehension of a passage, moving beyond questions that only probe local predicate-argument structures.
no code implementations • 25 Sep 2019 • Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min
In this opinion piece, we argue that question answering should be considered a format which is sometimes useful for studying particular phenomena, not a phenomenon or task in itself.
1 code implementation • IJCNLP 2019 • Sewon Min, Danqi Chen, Hannaneh Hajishirzi, Luke Zettlemoyer
Many question answering (QA) tasks only provide weak supervision for how the answer should be computed.
Ranked #2 on
Question Answering
on NarrativeQA
1 code implementation • ACL 2019 • Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer
Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs.
2 code implementations • ACL 2019 • Sewon Min, Victor Zhong, Luke Zettlemoyer, Hannaneh Hajishirzi
Multi-hop Reading Comprehension (RC) requires reasoning and aggregation across several paragraphs.
Ranked #66 on
Question Answering
on HotpotQA
1 code implementation • ACL 2018 • Sewon Min, Victor Zhong, Richard Socher, Caiming Xiong
Neural models for question answering (QA) over documents have achieved significant performance improvements.
Ranked #11 on
Question Answering
on NewsQA
(using extra training data)
1 code implementation • ICLR 2018 • Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi
Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens.
1 code implementation • ACL 2017 • Sewon Min, Minjoon Seo, Hannaneh Hajishirzi
We show that the task of question answering (QA) can significantly benefit from the transfer learning of models trained on a different large, fine-grained QA dataset.
2 code implementations • 14 Jun 2016 • Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi
In this paper, we study the problem of question answering when reasoning over multiple facts is required.
Ranked #2 on
Question Answering
on bAbi