no code implementations • Findings (EMNLP) 2021 • Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith
We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.
no code implementations • EMNLP 2020 • Tal August, Lauren Kim, Katharina Reinecke, Noah A. Smith
We collect a corpus of 128k science writing documents in English and annotate a subset of this corpus.
no code implementations • LREC 2022 • Daniel Cheng, Kyle Yan, Phillip Keung, Noah A. Smith
Social media platforms play an increasingly important role as forums for public discourse.
no code implementations • LREC 2022 • Daniel Edmiston, Phillip Keung, Noah A. Smith
Cross-lingual transfer learning without labeled target language data or parallel text has been surprisingly effective in zero-shot cross-lingual classification, question answering, unsupervised machine translation, etc.
2 code implementations • 31 Dec 2024 • Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William Merrill, Lester James V. Miranda, Jacob Morrison, Tyler Murray, Crystal Nam, Valentina Pyatkin, Aman Rangapur, Michael Schmitz, Sam Skjonsberg, David Wadden, Christopher Wilhelm, Michael Wilson, Luke Zettlemoyer, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi
Our modified model architecture and training recipe achieve both better training stability and improved per-token efficiency.
no code implementations • 5 Dec 2024 • Akshita Bhagia, Jiacheng Liu, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi
We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting.
1 code implementation • 22 Nov 2024 • Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, Hannaneh Hajishirzi
Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones.
1 code implementation • 24 Oct 2024 • Lester James V. Miranda, Yizhong Wang, Yanai Elazar, Sachin Kumar, Valentina Pyatkin, Faeze Brahman, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi
We analyze features from the routing model to identify characteristics of instances that can benefit from human feedback, e. g., prompts with a moderate safety concern or moderate intent complexity.
no code implementations • 21 Oct 2024 • Sachin Kumar, Chan Young Park, Yulia Tsvetkov, Noah A. Smith, Hannaneh Hajishirzi
Conventional algorithms for training language models (LMs) with human feedback rely on preferences that are assumed to account for an "average" user, disregarding subjectivity and finer-grained variations.
no code implementations • 21 Oct 2024 • Nikita Haduong, Noah A. Smith
Many domains now employ AI-based decision-making aids, and although the potential for AI systems to assist with decision making is much discussed, human-AI collaboration often underperforms due to factors such as (mis)trust in the AI system and beliefs about AI being incapable of completing subjective tasks.
no code implementations • 16 Oct 2024 • Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Pang Wei Koh, Jesse Dodge, Pradeep Dasigi
Adapting general-purpose language models to new skills is currently an expensive process that must be repeated as new instruction datasets targeting new skills are created, or can cause the models to forget older skills.
1 code implementation • 25 Sep 2024 • Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
Today's most advanced vision-language models (VLMs) remain proprietary.
2 code implementations • 3 Sep 2024 • Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE).
1 code implementation • 31 Aug 2024 • Guang Yang, Muru Zhang, Lin Qiu, Yanming Wan, Noah A. Smith
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image (object detection) and then assembles them into a music notation (notation assembly).
no code implementations • 16 Aug 2024 • Nikita Haduong, Alice Gao, Noah A. Smith
As NLP systems are increasingly deployed at scale, concerns about their potential negative impacts have attracted the attention of the research community, yet discussions of risk have mostly been at an abstract level and focused on generic AI or NLP applications.
1 code implementation • 12 Aug 2024 • Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith
Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood.
1 code implementation • 23 Jul 2024 • Jonathan Hayase, Alisa Liu, Yejin Choi, Sewoong Oh, Noah A. Smith
Our key insight is that the ordered list of merge rules learned by a BPE tokenizer naturally reveals information about the token frequencies in its training data.
no code implementations • 11 Jul 2024 • Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Valentin Hofmann, Tomasz Limisiewicz, Yulia Tsvetkov, Noah A. Smith
In multilingual settings, non-Latin scripts and low-resource languages are usually disadvantaged in terms of language models' utility, efficiency, and cost.
1 code implementation • 8 Jul 2024 • Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang
Data owners may request the removal of their data from a trained model due to privacy or copyright concerns.
1 code implementation • 2 Jul 2024 • Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
Chat-based language models are designed to be helpful, yet they should not comply with every user request.
1 code implementation • 27 Jun 2024 • Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi, Noah A. Smith, Simon S. Du
Unlike traditional methods that require careful curation of a mixture of datasets to achieve comprehensive improvement, we can quickly experiment with preference weightings using MOD to find the best combination of models.
1 code implementation • 27 Jun 2024 • Orevaoghene Ahia, Anuoluwapo Aremu, Diana Abagyan, Hila Gonen, David Ifeoluwa Adelani, Daud Abolade, Noah A. Smith, Yulia Tsvetkov
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects, resulting in disparities for dialects and varieties for which there are little to no resources or tools.
no code implementations • 26 Jun 2024 • Boyi Wei, Weijia Shi, Yangsibo Huang, Noah A. Smith, Chiyuan Zhang, Luke Zettlemoyer, Kai Li, Peter Henderson
Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material.
1 code implementation • 18 Jun 2024 • William Merrill, Noah A. Smith, Yanai Elazar
In this work, we investigate the extent to which modern LMs generate $n$-grams from their training data, evaluating both (i) the probability LMs assign to complete training $n$-grams and (ii) $n$-novelty, the proportion of $n$-grams generated by an LM that did not appear in the training data (for arbitrarily large $n$).
2 code implementations • 13 Jun 2024 • Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
High-quality preference data leads to improvements of up to 8% in instruction following and truthfulness.
1 code implementation • 10 May 2024 • Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, Jingyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych
We hope that our work will help set the agenda for research in machine-assisted scientific quality control in the age of AI, within the NLP community and beyond.
1 code implementation • 25 Apr 2024 • Kabir Ahuja, Vidhisha Balachandran, Madhur Panwar, Tianxing He, Noah A. Smith, Navin Goyal, Yulia Tsvetkov
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias.
no code implementations • 18 Apr 2024 • Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.
no code implementations • 21 Mar 2024 • Margaret Y. Li, Alisa Liu, Zhaofeng Wu, Noah A. Smith
Ambiguity is an critical component of language that allows for more effective communication between speakers, but is often ignored in NLP.
2 code implementations • 20 Mar 2024 • Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
To enhance scientific understanding of reward models, we present RewardBench, a benchmark dataset and code-base for evaluation.
1 code implementation • 19 Mar 2024 • Bo-Ru Lu, Nikita Haduong, Chien-Yu Lin, Hao Cheng, Noah A. Smith, Mari Ostendorf
Transformer-based NLP models are powerful but have high computational costs that limit deployment.
1 code implementation • 19 Mar 2024 • Rahul Nadkarni, Yizhong Wang, Noah A. Smith
Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks, demonstrating the capability of adapting to a broad variety of instructions.
1 code implementation • 26 Feb 2024 • Bowen Zhao, Zander Brumbaugh, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith
We then develop several methods, from prompting to finetuning, to align LMs to use their most recent knowledge when answering questions, and investigate various factors in this alignment.
3 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
1 code implementation • 31 Jan 2024 • Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo
As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations.
1 code implementation • 19 Jan 2024 • Terra Blevins, Tomasz Limisiewicz, Suchin Gururangan, Margaret Li, Hila Gonen, Noah A. Smith, Luke Zettlemoyer
Despite their popularity in non-English NLP, multilingual language models often underperform monolingual ones due to inter-language competition for model parameters.
2 code implementations • 16 Jan 2024 • Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors.
1 code implementation • 20 Dec 2023 • Kai Nylund, Suchin Gururangan, Noah A. Smith
We present time vectors, a simple tool to customize language models to new time periods.
no code implementations • 16 Dec 2023 • Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge
Evaluations of language models (LMs) commonly report perplexity on monolithic data held out from training.
no code implementations • 29 Nov 2023 • Sofia Serrano, Zander Brumbaugh, Noah A. Smith
Given the growing importance of AI literacy, we decided to write this tutorial to help narrow the gap between the discourse among those who study language models -- the core technology underlying ChatGPT and similar products -- and those who are intrigued and want to learn more about them.
3 code implementations • 17 Nov 2023 • Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques.
no code implementations • 16 Nov 2023 • Yanai Elazar, Bhargavi Paranjape, Hao Peng, Sarah Wiegreffe, Khyathi Raghavi, Vivek Srikumar, Sameer Singh, Noah A. Smith
Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e. g., the hypothesis in NLI) and the label; consequently, models trained only on those outperform chance.
1 code implementation • 14 Nov 2023 • Haoxin Li, Daniel Cheng, Phillip Keung, Jungo Kasai, Noah A. Smith
Generative retrieval (Wang et al., 2022; Tay et al., 2022) is a popular approach for end-to-end document retrieval that directly generates document identifiers given an input query.
1 code implementation • 31 Oct 2023 • Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse Dodge
We open-source WIMBD's code and artifacts to provide a standard set of evaluations for new text-based corpora and to encourage more analyses and transparency around them.
1 code implementation • 23 Oct 2023 • Jaechan Lee, Alisa Liu, Orevaoghene Ahia, Hila Gonen, Noah A. Smith
In experiments, we compare MT-specific models and language models for (i) their preference when given an ambiguous subsentence, (ii) their sensitivity to disambiguating context, and (iii) the performance disparity between figurative and literal source sentences.
1 code implementation • 16 Oct 2023 • Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Gergely Szilvasy, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis
Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion.
1 code implementation • 8 Aug 2023 • Sewon Min, Suchin Gururangan, Eric Wallace, Weijia Shi, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer
SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e. g., containing copyrighted books or news) that is only queried during inference.
no code implementations • 19 Jul 2023 • Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi
In response, we introduce Pentathlon, a benchmark for holistic and realistic evaluation of model efficiency.
no code implementations • 13 Jul 2023 • Bo-Ru Lu, Nikita Haduong, Chia-Hsuan Lee, Zeqiu Wu, Hao Cheng, Paul Koester, Jean Utke, Tao Yu, Noah A. Smith, Mari Ostendorf
The capabilities of pretrained language models have opened opportunities to explore new application areas, but applications involving human-human interaction are limited by the fact that most data is protected from public release for privacy reasons.
2 code implementations • 24 Jun 2023 • Yanai Elazar, Jiayao Zhang, David Wadden, Bo Zhang, Noah A. Smith
However, since quality is a challenging construct to estimate, we use the negative outcome control method, using paper citation count as a control variable to debias the quality confounding effect.
no code implementations • 16 Jun 2023 • Ian Magnusson, Noah A. Smith, Jesse Dodge
Scientific progress in NLP rests on the reproducibility of researchers' claims.
1 code implementation • 9 Jun 2023 • Judit Acs, Endre Hamerlik, Roy Schwartz, Noah A. Smith, Andras Kornai
We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks.
4 code implementations • NeurIPS 2023 • Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap.
no code implementations • 3 Jun 2023 • Sofia Serrano, Jesse Dodge, Noah A. Smith
Using a new statistical method, we examine whether such spurious patterns in data appear in models trained on the data.
1 code implementation • NeurIPS 2023 • Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e. g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e. g., factual incorrectness, irrelevance, and information incompleteness).
no code implementations • 23 May 2023 • Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia Tsvetkov
Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products.
1 code implementation • 22 May 2023 • Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith
A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements.
1 code implementation • 5 May 2023 • Jiacheng Liu, Wenya Wang, Dianzhuo Wang, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures.
1 code implementation • 27 Apr 2023 • Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset.
1 code implementation • 24 Mar 2023 • Suchin Gururangan, Margaret Li, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer
Large language models are typically trained densely: all parameters are updated with respect to all inputs.
1 code implementation • 11 Jan 2023 • Haoxin Li, Phillip Keung, Daniel Cheng, Jungo Kasai, Noah A. Smith
We propose NarrowBERT, a modified transformer encoder that increases the throughput for masked language model pretraining by more than $2\times$.
no code implementations • ICCV 2023 • Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo
PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).
19 code implementations • 20 Dec 2022 • Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi
Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations.
4 code implementations • 19 Dec 2022 • Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu
Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.
no code implementations • 8 Dec 2022 • Hila Gonen, Srini Iyer, Terra Blevins, Noah A. Smith, Luke Zettlemoyer
Language models can be prompted to perform a wide variety of zero- and few-shot learning problems.
1 code implementation • 1 Dec 2022 • Hamish Ivison, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi
Obtaining labeled data to train a model for a task of interest is often expensive.
no code implementations • 30 Nov 2022 • Daniel Edmiston, Phillip Keung, Noah A. Smith
Cross-lingual transfer learning without labeled target language data or parallel text has been surprisingly effective in zero-shot cross-lingual classification, question answering, unsupervised machine translation, etc.
1 code implementation • 7 Nov 2022 • Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith, Roy Schwartz
Our results motivate research on simpler alternatives to input-dependent attention, as well as on methods for better utilization of this mechanism in the Transformer architecture.
1 code implementation • 16 Oct 2022 • Zhaofeng Wu, Hao Peng, Nikolaos Pappas, Noah A. Smith
Document-level machine translation leverages inter-sentence dependencies to produce more coherent and consistent translations.
1 code implementation • 14 Oct 2022 • Zhaofeng Wu, William Merrill, Hao Peng, Iz Beltagy, Noah A. Smith
Many current NLP systems are built from language models trained to optimize unsupervised objectives on large amounts of raw text.
1 code implementation • 7 Oct 2022 • Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis
We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems.
Ranked #2 on
Question Answering
on FEVER
4 code implementations • 6 Oct 2022 • Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e. g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations.
Ranked #5 on
Table-based Fact Verification
on TabFact
1 code implementation • 5 Sep 2022 • Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
Departing from recent in-context learning methods, we formulate an annotation-efficient, two-step framework: selective annotation that chooses a pool of examples to annotate from unlabeled data in advance, followed by prompt retrieval that retrieves task examples from the annotated pool at test time.
1 code implementation • 2 Sep 2022 • Wenya Wang, Vivek Srikumar, Hanna Hajishirzi, Noah A. Smith
In question answering requiring common sense, language models (e. g., GPT-3) have been used to generate text expressing background knowledge that helps improve performance.
2 code implementations • 5 Aug 2022 • Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A. Smith, Luke Zettlemoyer
New ELMs are learned by branching from (mixtures of) ELMs in the current set, further training the parameters on data for the new domain, and then merging the resulting model back into the set for future use.
1 code implementation • NeurIPS 2023 • Jungo Kasai, Keisuke Sakaguchi, Yoichi Takahashi, Ronan Le Bras, Akari Asai, Xinyan Yu, Dragomir Radev, Noah A. Smith, Yejin Choi, Kentaro Inui
We introduce REALTIME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version).
no code implementations • 10 Jun 2022 • Jesse Dodge, Taylor Prewitt, Remi Tachet des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Noah A. Smith, Nicole DeCario, Will Buchanan
By providing unprecedented access to computational resources, cloud computing has enabled rapid growth in technologies such as machine learning, the computational demands of which incur a high energy cost and a commensurate carbon footprint.
1 code implementation • 24 May 2022 • Bo-Ru Lu, Yushi Hu, Hao Cheng, Noah A. Smith, Mari Ostendorf
Human conversations can evolve in many different ways, creating challenges for automatic understanding and summarization.
1 code implementation • 19 May 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Hao Peng, Ximing Lu, Dragomir Radev, Yejin Choi, Noah A. Smith
Our extensive evaluations on machine translation and scientific paper summarization demonstrate that Twist decoding substantially outperforms each model decoded in isolation over various scenarios, including cases where domain-specific and general-purpose models are both available.
10 code implementations • 16 Apr 2022 • Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, Siddhartha Mishra, Sujan Reddy, Sumanta Patro, Tanay Dixit, Xudong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi
This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones.
1 code implementation • 11 Apr 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Dragomir Radev, Yejin Choi, Noah A. Smith
Based on this finding, we introduce a patience factor, a simple modification to this beam decoding implementation, that generalizes the stopping criterion and provides flexibility to the depth of search.
1 code implementation • 16 Mar 2022 • Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf
In this work, we propose an in-context learning (ICL) framework for zero-shot and few-shot learning DST, where a large pre-trained language model (LM) takes a test instance and a few exemplars as input, and directly decodes the dialogue state without any parameter updates.
no code implementations • 25 Jan 2022 • Suchin Gururangan, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith
Language models increasingly rely on massive web dumps for diverse text data.
1 code implementation • 16 Jan 2022 • Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
Starting with an existing dataset, MultiNLI for natural language inference (NLI), our approach uses dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instructs GPT-3 to compose new examples with similar patterns.
1 code implementation • 16 Jan 2022 • Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu
Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases.
Ranked #1 on
Task-Oriented Dialogue Systems
on KVRET
no code implementations • 7 Jan 2022 • Maarten Sap, Anna Jafarpour, Yejin Choi, Noah A. Smith, James W. Pennebaker, Eric Horvitz
We quantify the differences between autobiographical and imagined stories by introducing sequentiality, a measure of narrative flow of events, drawing probabilistic inferences from a cutting-edge large language model (GPT-3).
1 code implementation • NAACL 2022 • Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin Choi
To enable constrained generation, we build on NeuroLogic decoding (Lu et al., 2021), combining its flexibility in incorporating logical constraints with A*esque estimates of future constraint satisfaction.
Ranked #1 on
Text Generation
on ROCStories
2 code implementations • NAACL 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. Smith
We therefore propose a generalization of leaderboards, bidimensional leaderboards (Billboards), that simultaneously tracks progress in language generation models and metrics for their evaluation.
2 code implementations • NAACL 2022 • Jungo Kasai, Keisuke Sakaguchi, Lavinia Dunagan, Jacob Morrison, Ronan Le Bras, Yejin Choi, Noah A. Smith
We establish THumB, a rubric-based human evaluation protocol for image captioning models.
no code implementations • NAACL 2022 • Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. Smith
The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases.
1 code implementation • NAACL 2022 • Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, Noah A. Smith
When an NLP model is trained on text data from one time period and tested or deployed on data from another, the resulting temporal misalignment can degrade end-task performance.
no code implementations • ACL 2022 • Hao Peng, Jungo Kasai, Nikolaos Pappas, Dani Yogatama, Zhaofeng Wu, Lingpeng Kong, Roy Schwartz, Noah A. Smith
One way to improve the efficiency is to bound the memory size.
no code implementations • 1 Oct 2021 • Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith
We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.
1 code implementation • EMNLP 2021 • Ivan Montero, Nikolaos Pappas, Noah A. Smith
Representation learning for text via pretraining a language model on a large corpus has become a standard starting point for building NLP systems.
10 code implementations • ICLR 2022 • Ofir Press, Noah A. Smith, Mike Lewis
Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training?
2 code implementations • NAACL 2022 • Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer
We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text.
no code implementations • ACL 2021 • Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith
Human evaluations are typically considered the gold standard in natural language generation, but as models{'} fluency improves, how well can evaluators detect and judge machine-generated text?
no code implementations • ACL 2022 • Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A. Smith, Yejin Choi
To support the broad range of real machine errors that can be identified by laypeople, the ten error categories of Scarecrow -- such as redundancy, commonsense errors, and incoherence -- are identified through several rounds of crowd annotation experiments without a predefined ontology.
no code implementations • 30 Jun 2021 • William Merrill, Ashish Sabharwal, Noah A. Smith
Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages.
no code implementations • 30 Jun 2021 • Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith
Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge machine-generated text?
1 code implementation • AKBC 2021 • Rahul Nadkarni, David Wadden, Iz Beltagy, Noah A. Smith, Hannaneh Hajishirzi, Tom Hope
Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes.
1 code implementation • EMNLP (MRL) 2021 • Ethan C. Chau, Noah A. Smith
Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations.
no code implementations • NAACL 2021 • Elizabeth Clark, Noah A. Smith
Story generation is an open-ended and subjective task, which poses a challenge for evaluating story generation models.
1 code implementation • ACL 2021 • Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, Yejin Choi
Despite recent advances in natural language generation, it remains challenging to control attributes of generated text.
1 code implementation • NAACL 2021 • Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt Gardner
Readers of academic research papers often read with the goal of answering specific questions.
Ranked #1 on
Evidence Selection
on QASPER
no code implementations • 22 Apr 2021 • William Merrill, Yoav Goldberg, Roy Schwartz, Noah A. Smith
We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.
1 code implementation • 18 Apr 2021 • Rik Koncel-Kedziorski, Noah A. Smith
This method can improve perplexity of pretrained LMs with no updates to the LM's own parameters.
no code implementations • EMNLP 2021 • Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith
In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems.
1 code implementation • Findings (EMNLP) 2021 • Leo Z. Liu, Yizhong Wang, Jungo Kasai, Hannaneh Hajishirzi, Noah A. Smith
Models of language trained on very large corpora have been demonstrated useful for NLP.
2 code implementations • EMNLP 2021 • Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith
Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune.
Ranked #2 on
Machine Translation
on WMT2017 Chinese-English
no code implementations • ICLR 2021 • Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong
RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism.
Ranked #28 on
Machine Translation
on IWSLT2014 German-English
2 code implementations • EACL 2021 • Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases.
2 code implementations • 17 Jan 2021 • Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A. Smith, Daniel S. Weld
While often assumed a gold standard, effective human evaluation of text generation remains an important, open area for research.
1 code implementation • ACL 2021 • Ofir Press, Noah A. Smith, Mike Lewis
Increasing the input length has been a driver of progress in language modeling with transformers.
Ranked #26 on
Language Modelling
on WikiText-103
1 code implementation • 10 Dec 2020 • Zhaofeng Wu, Hao Peng, Noah A. Smith
For natural language processing systems, two kinds of evidence support the use of text representations from neural language models "pretrained" on large unannotated corpora: performance on application-inspired benchmarks (Peters et al., 2018, inter alia), and the emergence of syntactic abstractions in those representations (Tenney et al., 2019, inter alia).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Rachel Rudinger, Vered Shwartz, Jena D. Hwang, Chandra Bhagavatula, Maxwell Forbes, Ronan Le Bras, Noah A. Smith, Yejin Choi
Defeasible inference is a mode of reasoning in which an inference (X is a bird, therefore X flies) may be weakened or overturned in light of new evidence (X is a penguin).
1 code implementation • EMNLP 2021 • Sarah Wiegreffe, Ana Marasović, Noah A. Smith
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance.
no code implementations • 15 Oct 2020 • Phillip Keung, Julian Salazar, Yichao Lu, Noah A. Smith
We then improve an XLM-based unsupervised neural MT system pre-trained on Wikipedia by supplementing it with pseudo-parallel text mined from the same corpus, boosting unsupervised translation performance by up to 3. 5 BLEU on the WMT'14 French-English and WMT'16 German-English tasks and outperforming the previous state-of-the-art.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A. Smith, Yejin Choi
Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights.
1 code implementation • EMNLP 2020 • Florian Mai, Nikolaos Pappas, Ivan Montero, Noah A. Smith, James Henderson
Text autoencoders are commonly used for conditional generation tasks such as style transfer.
1 code implementation • EMNLP 2020 • Phillip Keung, Yichao Lu, György Szarvas, Noah A. Smith
We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale collection of Amazon reviews for multilingual text classification.
1 code implementation • EMNLP 2020 • Xuhui Zhou, Nikolaos Pappas, Noah A. Smith
Text alignment finds application in tasks such as citation recommendation and plagiarism detection.
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Ethan C. Chau, Lucy H. Lin, Noah A. Smith
Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties.
1 code implementation • EMNLP 2020 • Nikolaos Pappas, Phoebe Mulcaire, Noah A. Smith
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
3 code implementations • Findings of the Association for Computational Linguistics 2020 • Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
6 code implementations • EMNLP 2020 • Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, Yejin Choi
Experiments across four datasets show that these model-dependent measures reveal three distinct regions in the data map, each with pronounced characteristics.
no code implementations • ACL 2020 • Maarten Sap, Eric Horvitz, Yejin Choi, Noah A. Smith, James Pennebaker
We introduce a measure of narrative flow and use this to examine the narratives for imagined and recalled events.
no code implementations • ACL 2020 • Hao Peng, Roy Schwartz, Dianqi Li, Noah A. Smith
Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks.
no code implementations • WS 2020 • Tal August, Maarten Sap, Elizabeth Clark, Katharina Reinecke, Noah A. Smith
We analyze the effect of author and reader characteristics and story writing setup on the quality of stories in a short storytelling task.
2 code implementations • ICLR 2021 • Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith
We show that the speed disadvantage for autoregressive baselines compared to non-autoregressive methods has been overestimated in three aspects: suboptimal layer allocation, insufficient speed measurement, and lack of knowledge distillation.
no code implementations • CL 2020 • Marta R. Costa-juss{\`a}, Cristina Espa{\~n}a-Bonet, Pascale Fung, Noah A. Smith
We introduce the Computational Linguistics special issue on Multilingual and Interlingual Semantic Representations for Natural Language Processing.
no code implementations • 13 May 2020 • Hao Peng, Roy Schwartz, Dianqi Li, Noah A. Smith
Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks.
6 code implementations • ACL 2020 • Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP.
no code implementations • ACL 2020 • William Merrill, Gail Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, Eran Yahav
While formally extending these findings to unsaturated RNNs is left to future work, we hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy.
1 code implementation • ACL 2020 • Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith
Our method presents a favorable speed/accuracy tradeoff in almost all cases, producing models which are up to five times faster than the state of the art, while preserving their accuracy.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • 2 Mar 2020 • Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Jianfeng Gao, Yejin Choi, Noah A. Smith
Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified.
1 code implementation • ACL 2021 • Kelvin Luu, Xinyi Wu, Rik Koncel-Kedziorski, Kyle Lo, Isabel Cachola, Noah A. Smith
We address the task of explaining relationships between two scientific documents using natural language text.
no code implementations • 2 Jan 2020 • Dallas Card, Noah A. Smith
In this paper we provide a consequentialist critique of common definitions of fairness within machine learning, as well as a machine learning perspective on consequentialism.
no code implementations • ACL 2020 • Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, Yejin Choi
We introduce Social Bias Frames, a new conceptual formalism that aims to model the pragmatic frames in which people project social biases and stereotypes onto others.
2 code implementations • ACL 2020 • Ofir Press, Noah A. Smith, Omer Levy
Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers.
Ranked #7 on
Language Modelling
on enwik8
no code implementations • ICLR 2020 • Lucy H. Lin, Noah A. Smith
As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words have come to play an increasingly important role.
no code implementations • CONLL 2019 • Phoebe Mulcaire, Jungo Kasai, Noah A. Smith
Despite advances in dependency parsing, languages with small treebanks still present challenges.
1 code implementation • 18 Sep 2019 • Deric Pang, Lucy H. Lin, Noah A. Smith
We introduce a novel approach to incorporate syntax into natural language inference (NLI) models.
1 code implementation • IJCNLP 2019 • Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith
Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world entities and are often unable to remember facts about those entities.
Ranked #9 on
Relation Classification
on TACRED
1 code implementation • IJCNLP 2019 • Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith
Our method also highlights the interpretable properties of rational RNNs.
4 code implementations • IJCNLP 2019 • Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith
Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e. g., accuracy) on held-out test data, compared to previous results.
1 code implementation • IJCNLP 2019 • Hao Peng, Roy Schwartz, Noah A. Smith
We present PaLM, a hybrid parser and neural language model.
1 code implementation • IJCNLP 2019 • Sachin Kumar, Shuly Wintner, Noah A. Smith, Yulia Tsvetkov
Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well.
no code implementations • 29 Aug 2019 • Swabha Swayamdipta, Matthew Peters, Brendan Roof, Chris Dyer, Noah A. Smith
Shallow syntax provides an approximation of phrase-syntactic structure of sentences; it can be produced with high accuracy, and is computationally cheap to obtain.
1 code implementation • IJCNLP 2019 • Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner
Machine comprehension of texts longer than a single sentence often requires coreference resolution.
2 code implementations • 22 Jul 2019 • Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni
Moreover, the financial cost of the computations can make it difficult for academics, students, and researchers, in particular those from emerging economies, to engage in deep learning research.
no code implementations • ACL 2019 • Elizabeth Clark, Asli Celikyilmaz, Noah A. Smith
For evaluating machine-generated texts, automatic methods hold the promise of avoiding collection of human judgments, which can be expensive and time-consuming.
no code implementations • ACL 2019 • Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, Noah A. Smith
We investigate how annotators{'} insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations.
1 code implementation • ACL 2019 • Sofia Serrano, Noah A. Smith
Attention mechanisms have recently boosted performance on a range of NLP tasks.
1 code implementation • ACL 2019 • Suchin Gururangan, Tam Dang, Dallas Card, Noah A. Smith
We accompany this paper with code to pretrain and use VAMPIRE embeddings in downstream tasks.
2 code implementations • ACL 2019 • Gabriel Stanovsky, Noah A. Smith, Luke Zettlemoyer
We present the first challenge set and evaluation protocol for the analysis of gender bias in machine translation (MT).
no code implementations • NAACL 2019 • Nelson F. Liu, Roy Schwartz, Noah A. Smith
Several datasets have recently been constructed to expose brittleness in models trained on existing benchmarks.
no code implementations • NAACL 2019 • Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith
Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language.
no code implementations • WS 2019 • Matthew E. Peters, Sebastian Ruder, Noah A. Smith
While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task.
no code implementations • TACL 2019 • Kelvin Luu, Chenhao Tan, Noah A. Smith
We build on a widely used model of skill in two-player games and augment it with linguistic features of a debater{'}s content.
1 code implementation • NAACL 2019 • Phoebe Mulcaire, Jungo Kasai, Noah A. Smith
We introduce Rosita, a method to produce multilingual contextual word representations by training a single language model on text from multiple languages.
2 code implementations • 15 Feb 2019 • Noah A. Smith
This introduction aims to tell the story of how we put words into computers.
2 code implementations • 6 Nov 2018 • Dallas Card, Michael Zhang, Noah A. Smith
Recent advances in deep learning have achieved impressive gains in classification accuracy on a variety of types of data, including images and text.
1 code implementation • 31 Oct 2018 • Ofir Press, Noah A. Smith
In NMT, how far can we get without attention and without separate encoding and decoding?
2 code implementations • 31 Oct 2018 • Maarten Sap, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, Yejin Choi
We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge.
1 code implementation • EMNLP 2018 • Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, Noah A. Smith
We introduce the syntactic scaffold, an approach to incorporating syntactic information into semantic tasks.
1 code implementation • EMNLP 2018 • Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell
To improve robustness to word order differences, we propose to use self-attention, which allows for a degree of flexibility with respect to word order.
1 code implementation • EMNLP 2018 • Hao Peng, Roy Schwartz, Sam Thomson, Noah A. Smith
We characterize this connection formally, defining rational recurrences to be recurrent hidden state update functions that can be written as the Forward calculation of a finite set of WFSAs.
no code implementations • 28 Aug 2018 • Lucy H. Lin, Scott Miles, Noah A. Smith
We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection.
no code implementations • ACL 2018 • Roy Schwartz, Sam Thomson, Noah A. Smith
Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural language utterances.
no code implementations • WS 2018 • Nelson F. Liu, Gina-Anne Levow, Noah A. Smith
We introduce a simple method for extracting non-arbitrary form-meaning representations from a collection of semantic vectors.
no code implementations • NAACL 2018 • Dallas Card, Noah A. Smith
Estimating label proportions in a target corpus is a type of measurement that is useful for answering certain types of social-scientific questions.
no code implementations • NAACL 2018 • Elizabeth Clark, Yangfeng Ji, Noah A. Smith
We introduce an approach to neural text generation that explicitly represents entities mentioned in the text.
no code implementations • WS 2018 • Nelson F. Liu, Omer Levy, Roy Schwartz, Chenhao Tan, Noah A. Smith
While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data.
1 code implementation • HLT 2015 • Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, Noah A. Smith
We present a novel abstractive summarization framework that draws on the recent development of a treebank for the Abstract Meaning Representation (AMR).
Abstractive Text Summarization
Abstract Meaning Representation
no code implementations • ACL 2018 • Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A. Smith, Yejin Choi
We investigate a new commonsense inference task: given an event described in a short free-form text ("X drinks coffee in the morning"), a system reasons about the likely intents ("X wants to stay awake") and reactions ("X feels alert") of the event's participants.
Ranked #1 on
Common Sense Reasoning
on Event2Mind test
2 code implementations • 15 May 2018 • Roy Schwartz, Sam Thomson, Noah A. Smith
Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural language utterances.
Explainable artificial intelligence
General Classification
+3
1 code implementation • ACL 2018 • Hao Peng, Sam Thomson, Noah A. Smith
We introduce the structured projection of intermediate gradients optimization technique (SPIGOT), a new method for backpropagating through neural networks that include hard-decision structured predictions (e. g., parsing) in intermediate layers.
no code implementations • NAACL 2018 • Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, Mari Ostendorf
We present Sounding Board, a social chatbot that won the 2017 Amazon Alexa Prize.
1 code implementation • NAACL 2018 • Yijia Liu, Yi Zhu, Wanxiang Che, Bing Qin, Nathan Schneider, Noah A. Smith
Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD.
Ranked #2 on
Dependency Parsing
on Tweebank
2 code implementations • NAACL 2018 • Hao Peng, Sam Thomson, Swabha Swayamdipta, Noah A. Smith
We present a new approach to learning semantic parsers from multiple datasets, even when the target semantic formalisms are drastically different, and the underlying corpora do not overlap.
no code implementations • NAACL 2018 • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, Noah A. Smith
Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to.
no code implementations • 23 Feb 2018 • Chenhao Tan, Hao Peng, Noah A. Smith
We first examine the effect of wording and propose a binary classification framework that controls for both the speaker and the debate situation.
2 code implementations • EMNLP 2017 • Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, Noah A. Smith
Understanding a long document requires tracking how entities are introduced and evolve over time.
no code implementations • 1 Aug 2017 • Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.
10 code implementations • 29 Jun 2017 • Swabha Swayamdipta, Sam Thomson, Chris Dyer, Noah A. Smith
We present a new, efficient frame-semantic parser that labels semantic arguments to FrameNet predicates.
no code implementations • ICLR 2018 • Jesse Dodge, Kevin Jamieson, Noah A. Smith
Driven by the need for parallelizable hyperparameter optimization methods, this paper studies \emph{open loop} search methods: sequences that are predetermined and can be generated before a single configuration is evaluated.
no code implementations • CL 2017 • Miguel Ballesteros, Chris Dyer, Yoav Goldberg, Noah A. Smith
During training, dynamic oracles alternate between sampling parser states from the training data and from the model as it is being learned, making the model more robust to the kinds of errors that will be made at test time.
3 code implementations • ACL 2018 • Dallas Card, Chenhao Tan, Noah A. Smith
Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information.
1 code implementation • ACL 2017 • Chenhao Tan, Dallas Card, Noah A. Smith
Combining two statistics --- cooccurrence within documents and prevalence correlation over time --- our approach reveals a number of different ways in which ideas can cooperate and compete.
1 code implementation • ACL 2017 • Hao Peng, Sam Thomson, Noah A. Smith
We present a deep neural architecture that parses sentences into three semantic dependency graph formalisms.
no code implementations • WS 2017 • Roy Schwartz, Maarten Sap, Ioannis Konstas, Leila Zilles, Yejin Choi, Noah A. Smith
This paper describes University of Washington NLP{'}s submission for the Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem 2017) shared task{---}the Story Cloze Task.
no code implementations • 21 Feb 2017 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith
Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models.
1 code implementation • CONLL 2017 • Roy Schwartz, Maarten Sap, Ioannis Konstas, Li Zilles, Yejin Choi, Noah A. Smith
A writer's style depends not just on personal traits but also on her intent and mental state.
1 code implementation • EACL 2017 • Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, Noah A. Smith
We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection.
Ranked #22 on
Constituency Parsing
on Penn Treebank