Search Results for author: Noah A. Smith

We study the relationship between social media output and National Football League (NFL) games, using a dataset containing messages from Twitter and NFL game statistics.

Paper
Add Code

Dynamic Language Models for Streaming Text

no code implementations • TACL 2014 • Dani Yogatama, Chong Wang, Bryan R. Routledge, Noah A. Smith, Eric P. Xing

We present a probabilistic language model that captures temporal dynamics and conditions on arbitrary non-linguistic context features.

Language Modelling Machine Translation +1

Paper
Add Code

Unsupervised Discovery of Biographical Structure from Text

no code implementations • TACL 2014 • David Bamman, Noah A. Smith

We present a method for discovering abstract event classes in biographies, based on a probabilistic latent-variable model.

Paper
Add Code

Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut

no code implementations • TACL 2014 • Nathan Schneider, Emily Danchik, Chris Dyer, Noah A. Smith

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation.

Chunking Segmentation +2

Paper
Add Code

Frame-Semantic Parsing

no code implementations • CL 2014 • Dipanjan Das, Desai Chen, Andr{\'e} F. T. Martins, Nathan Schneider, Noah A. Smith

Semantic Parsing

Paper
Add Code

An Empirical Comparison of Parsing Methods for Stanford Dependencies

no code implementations • 16 Apr 2014 • Lingpeng Kong, Noah A. Smith

Stanford typed dependencies are a widely desired representation of natural language sentences, but parsing is one of the major computational bottlenecks in text analysis systems.

Dependency Parsing

Paper
Add Code

Comprehensive Annotation of Multiword Expressions in a Social Web Corpus

no code implementations • LREC 2014 • Nathan Schneider, Spencer Onuffer, Nora Kazour, Emily Danchik, Michael T. Mordowanec, Henrietta Conrad, Noah A. Smith

Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class.

Language Acquisition Machine Translation +1

Paper
Add Code

A Discriminative Graph-Based Parser for the Abstract Meaning Representation

no code implementations • ACL 2014 • Jeffrey Flanigan, Sam Thomson, Jaime Carbonell, Chris Dyer, Noah A. Smith

AMR Parsing Dependency Parsing +2

Paper
Add Code

Distributed Representations of Geographically Situated Language

no code implementations • ACL 2014 • David Bamman, Chris Dyer, Noah A. Smith

Representation Learning Semantic Textual Similarity

Paper
Add Code

Unsupervised Alignment of Privacy Policies using Hidden Markov Models

no code implementations • ACL 2014 • Rohan Ramanath, Fei Liu, Norman Sadeh, Noah A. Smith

Topic Models

Paper
Add Code

Linguistic Structured Sparsity in Text Categorization

no code implementations • ACL 2014 • Dani Yogatama, Noah A. Smith

Feature Engineering Language Modelling +4

Paper
Add Code

Simplified Dependency Annotations with GFL-Web

no code implementations • ACL 2014 • Michael T. Mordowanec, Nathan Schneider, Chris Dyer, Noah A. Smith

Paper
Add Code

Overview of the 2014 NLP Unshared Task in PoliInformatics

no code implementations • WS 2014 • Noah A. Smith, Claire Cardie, Anne Washington, John Wilkerson

Paper
Add Code

Phrase Dependency Machine Translation with Quasi-Synchronous Tree-to-Tree Features

no code implementations • CL 2014 • Kevin Gimpel, Noah A. Smith

Dependency Parsing Machine Translation +1

Paper
Add Code

A Bayesian Mixed Effects Model of Literary Character

no code implementations • ACL 2014 • David Bamman, Ted Underwood, Noah A. Smith

Coreference Resolution

Paper
Add Code

Weakly-Supervised Bayesian Learning of a CCG Supertagger

no code implementations • WS 2014 • Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith

Paper
Add Code

Learning Word Representations with Hierarchical Sparse Coding

no code implementations • 8 Jun 2014 • Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah A. Smith

We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings.

Sentence Sentence Completion +2

Paper
Add Code

CMU: Arc-Factored, Discriminative Semantic Dependency Parsing

no code implementations • SEMEVAL 2014 • Sam Thomson, Brendan O{'}Connor, Jeffrey Flanigan, David Bamman, Jesse Dodge, Swabha Swayamdipta, Nathan Schneider, Chris Dyer, Noah A. Smith

Dependency Parsing Knowledge Base Population +2

Paper
Add Code

A Step Towards Usable Privacy Policy: Automatic Alignment of Privacy Statements

no code implementations • COLING 2014 • Fei Liu, Rohan Ramanath, Norman Sadeh, Noah A. Smith

Paper
Add Code

The Utility of Text: The Case of Amicus Briefs and the Supreme Court

no code implementations • 29 Sep 2014 • Yanchuan Sim, Bryan Routledge, Noah A. Smith

We explore the idea that authoring a piece of text is an act of maximizing one's expected utility.

counterfactual

Paper
Add Code

A Dependency Parser for Tweets

no code implementations • EMNLP 2014 • Lingpeng Kong, Nathan Schneider, Swabha Swayamdipta, Archna Bhatia, Chris Dyer, Noah A. Smith

Dependency Parsing Domain Adaptation +1

Paper
Add Code

Conditional Random Field Autoencoders for Unsupervised Structured Prediction

1 code implementation • NeurIPS 2014 • Waleed Ammar, Chris Dyer, Noah A. Smith

We introduce a framework for unsupervised learning of structured predictors with overlapping, global features.

MULTI-VIEW LEARNING Structured Prediction +1

Paper
Code

Retrofitting Word Vectors to Semantic Lexicons

2 code implementations • HLT 2015 • Manaal Faruqui, Jesse Dodge, Sujay K. Jauhar, Chris Dyer, Eduard Hovy, Noah A. Smith

Vector space word representations are learned from distributional information of words in large corpora.

372

Paper
Code

Bayesian Optimization of Text Representations

no code implementations • EMNLP 2015 • Dani Yogatama, Noah A. Smith

When applying machine learning to problems in NLP, there are many choices to make about how to represent input texts.

Bayesian Optimization General Classification +2

Paper
Add Code

Transforming Dependencies into Phrase Structures

1 code implementation • HLT 2015 • Noah A. Smith, Alexander M. Rush, Lingpeng Kong

Dependency Parsing Sentence +1

Paper
Code

A Corpus and Model Integrating Multiword Expressions and Supersenses

no code implementations • HLT 2015 • Noah A. Smith, Nathan Schneider

Miscellaneous

Paper
Add Code

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

7 code implementations • IJCNLP 2015 • Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith

We propose a technique for learning representations of parser states in transition-based dependency parsers.

Transition-Based Dependency Parsing

274

Paper
Code

Frame-Semantic Role Labeling with Heterogeneous Annotations

no code implementations • IJCNLP 2015 • Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A. Smith, Chris Dyer

Domain Adaptation Semantic Parsing +1

Paper
Add Code

The Media Frames Corpus: Annotations of Frames Across Issues

1 code implementation • IJCNLP 2015 • Dallas Card, Amber E. Boydstun, Justin H. Gross, Philip Resnik, Noah A. Smith

Paper
Code

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

no code implementations • CONLL 2015 • Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith

Paper
Add Code

Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs

1 code implementation • EMNLP 2015 • Miguel Ballesteros, Chris Dyer, Noah A. Smith

We present extensions to a continuous-state dependency parsing method that makes it applicable to morphologically rich languages.

Dependency Parsing

204

Paper
Code

Extractive Summarization by Maximizing Semantic Volume

no code implementations • EMNLP 2015 • Dani Yogatama, Fei Liu, Noah A. Smith

Dictionary Learning Extractive Summarization +1

Paper
Add Code

Open Extraction of Fine-Grained Political Statements

no code implementations • EMNLP 2015 • David Bamman, Noah A. Smith

Open Information Extraction Slot Filling +1

Paper
Add Code

A Utility Model of Authors in the Scientific Community

no code implementations • EMNLP 2015 • Yanchuan Sim, Bryan Routledge, Noah A. Smith

Decision Making

Paper
Add Code

Segmental Recurrent Neural Networks

2 code implementations • 18 Nov 2015 • Lingpeng Kong, Chris Dyer, Noah A. Smith

Representations of the input segments (i. e., contiguous subsequences of the input) are computed by encoding their constituent tokens using bidirectional recurrent neural nets, and these "segment embeddings" are used to define compatibility scores with output labels.

Chinese Word Segmentation Handwriting Recognition +2

Paper
Code

Annotating Character Relationships in Literary Texts

no code implementations • 2 Dec 2015 • Philip Massey, Patrick Xia, David Bamman, Noah A. Smith

We present a dataset of manually annotated relationships between characters in literary texts, in order to support the training and evaluation of automatic methods for relation type prediction in this domain (Makazhanov et al., 2014; Kokkinakis, 2013) and the broader computational analysis of literary character (Elson et al., 2010; Bamman et al., 2014; Vala et al., 2015; Flekova and Gurevych, 2015).

Type prediction

Paper
Add Code

Many Languages, One Parser

1 code implementation • TACL 2016 • Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith

We train one multilingual model for dependency parsing and use it to parse sentences in several languages.

Ranked #2 on Cross-lingual zero-shot dependency parsing on Universal Dependency Treebank

Cross-lingual zero-shot dependency parsing POS

Paper
Code

Massively Multilingual Word Embeddings

1 code implementation • 5 Feb 2016 • Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith

We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space.

Multilingual Word Embeddings Text Categorization

Paper
Code

Recurrent Neural Network Grammars

6 code implementations • NAACL 2016 • Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith

We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure.

Ranked #25 on Constituency Parsing on Penn Treebank

Constituency Parsing Language Modelling

186

Paper
Code

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

no code implementations • 1 Mar 2016 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals

This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction.

Ranked #16 on Speech Recognition on TIMIT

Acoustic Modelling Language Modelling +2

Paper
Add Code

Training with Exploration Improves a Greedy Stack-LSTM Parser

no code implementations • 11 Mar 2016 • Miguel Ballesteros, Yoav Goldberg, Chris Dyer, Noah A. Smith

We adapt the greedy Stack-LSTM dependency parser of Dyer et al. (2015) to support a training-with-exploration procedure using dynamic oracles(Goldberg and Nivre, 2013) instead of cross-entropy minimization.

Ranked #2 on Chinese Dependency Parsing on Chinese Pennbank

Chinese Dependency Parsing Dependency Parsing

Paper
Add Code

UW-CSE at SemEval-2016 Task 10: Detecting Multiword Expressions and Supersenses using Double-Chained Conditional Random Fields

1 code implementation • SEMEVAL 2016 • Mohammad Javad Hosseini, Noah A. Smith, Su-In Lee

Paper
Code

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

no code implementations • SEMEVAL 2016 • Jeffrey Flanigan, Chris Dyer, Noah A. Smith, Jaime Carbonell

Ranked #3 on AMR Parsing on LDC2015E86

AMR Parsing Structured Prediction

Paper
Add Code

Generation from Abstract Meaning Representation using Tree Transducers

no code implementations • NAACL 2016 • Jeffrey Flanigan, Chris Dyer, Noah A. Smith, Jaime Carbonell

Language Modelling Machine Translation +1

Paper
Add Code

Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

1 code implementation • CONLL 2016 • Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer, Noah A. Smith

We present a transition-based parser that jointly produces syntactic and semantic dependencies.

Semantic Parsing

Paper
Code

Hierarchical Character-Word Models for Language Identification

1 code implementation • WS 2016 • Aaron Jaech, George Mulcaire, Shobhit Hathi, Mari Ostendorf, Noah A. Smith

Social media messages' brevity and unconventional spelling pose a challenge to language identification.

Language Identification

Paper
Code

Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

1 code implementation • EMNLP 2016 • Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Noah A. Smith

We introduce two first-order graph-based dependency parsers achieving a new state of the art.

Ranked #17 on Dependency Parsing on Penn Treebank

Dependency Parsing

Paper
Code

Character Sequence Models for ColorfulWords

no code implementations • 28 Sep 2016 • Kazuya Kawakami, Chris Dyer, Bryan R. Routledge, Noah A. Smith

We present a neural network architecture to predict a point in color space from the sequence of characters in the color's name.

Paper
Add Code

Training with Exploration Improves a Greedy Stack LSTM Parser

no code implementations • EMNLP 2016 • Miguel Ballesteros, Yoav Goldberg, Chris Dyer, Noah A. Smith

Paper
Add Code

Analyzing Framing through the Casts of Characters in the News

no code implementations • EMNLP 2016 • Dallas Card, Justin Gross, Amber Boydstun, Noah A. Smith

Clustering Model Selection +1

Paper
Add Code

Friends with Motives: Using Text to Infer Influence on SCOTUS

no code implementations • EMNLP 2016 • Yanchuan Sim, Bryan Routledge, Noah A. Smith

Decision Making

Paper
Add Code

Character Sequence Models for Colorful Words

no code implementations • EMNLP 2016 • Kazuya Kawakami, Chris Dyer, Bryan Routledge, Noah A. Smith

Language Modelling

Paper
Add Code

Semi-Supervised Learning of Sequence Models with Method of Moments

no code implementations • EMNLP 2016 • Zita Marinho, Andr{\'e} F. T. Martins, Shay B. Cohen, Noah A. Smith

Part-Of-Speech Tagging

Paper
Add Code

A Neural Model for Language Identification in Code-Switched Tweets

no code implementations • WS 2016 • Aaron Jaech, George Mulcaire, Mari Ostendorf, Noah A. Smith

Language Identification Language Modelling +2

Paper
Add Code

What Do Recurrent Neural Network Grammars Learn About Syntax?

1 code implementation • EACL 2017 • Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, Noah A. Smith

We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection.

Ranked #20 on Constituency Parsing on Penn Treebank

Constituency Parsing Dependency Parsing +1

186

Paper
Code

The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task

1 code implementation • CONLL 2017 • Roy Schwartz, Maarten Sap, Ioannis Konstas, Li Zilles, Yejin Choi, Noah A. Smith

A writer's style depends not just on personal traits but also on her intent and mental state.

Language Modelling

Paper
Code

Multitask Learning with CTC and Segmental CRF for Speech Recognition

no code implementations • 21 Feb 2017 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith

Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models.

speech-recognition Speech Recognition

Paper
Add Code

Story Cloze Task: UW NLP System

no code implementations • WS 2017 • Roy Schwartz, Maarten Sap, Ioannis Konstas, Leila Zilles, Yejin Choi, Noah A. Smith

This paper describes University of Washington NLP{'}s submission for the Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem 2017) shared task{---}the Story Cloze Task.

Language Modelling

Paper
Add Code

Deep Multitask Learning for Semantic Dependency Parsing

1 code implementation • ACL 2017 • Hao Peng, Sam Thomson, Noah A. Smith

We present a deep neural architecture that parses sentences into three semantic dependency graph formalisms.

Dependency Parsing Semantic Dependency Parsing

Paper
Code

Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts

1 code implementation • ACL 2017 • Chenhao Tan, Dallas Card, Noah A. Smith

Combining two statistics --- cooccurrence within documents and prevalence correlation over time --- our approach reveals a number of different ways in which ideas can cooperate and compete.

Paper
Code

Neural Models for Documents with Metadata

3 code implementations • ACL 2018 • Dallas Card, Chenhao Tan, Noah A. Smith

Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information.

Topic Models Variational Inference

102

Paper
Code

Greedy Transition-Based Dependency Parsing with Stack LSTMs

no code implementations • CL 2017 • Miguel Ballesteros, Chris Dyer, Yoav Goldberg, Noah A. Smith

During training, dynamic oracles alternate between sampling parser states from the training data and from the model as it is being learned, making the model more robust to the kinds of errors that will be made at test time.

Transition-Based Dependency Parsing

Paper
Add Code

Open Loop Hyperparameter Optimization and Determinantal Point Processes

no code implementations • ICLR 2018 • Jesse Dodge, Kevin Jamieson, Noah A. Smith

Driven by the need for parallelizable hyperparameter optimization methods, this paper studies \emph{open loop} search methods: sequences that are predetermined and can be generated before a single configuration is evaluated.

Hyperparameter Optimization Point Processes

Paper
Add Code

Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold

10 code implementations • 29 Jun 2017 • Swabha Swayamdipta, Sam Thomson, Chris Dyer, Noah A. Smith

We present a new, efficient frame-semantic parser that labels semantic arguments to FrameNet predicates.

Semantic Parsing

221

Paper
Code

End-to-End Neural Segmental Models for Speech Recognition

no code implementations • 1 Aug 2017 • Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.

speech-recognition Speech Recognition

Paper
Add Code

Dynamic Entity Representations in Neural Language Models

2 code implementations • EMNLP 2017 • Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, Noah A. Smith

Understanding a long document requires tracking how entities are introduced and evolve over time.

Language Modelling

129

Paper
Code

"You are no Jack Kennedy": On Media Selection of Highlights from Presidential Debates

no code implementations • 23 Feb 2018 • Chenhao Tan, Hao Peng, Noah A. Smith

We first examine the effect of wording and propose a binary classification framework that controls for both the speaker and the debate situation.

Binary Classification

Paper
Add Code

Annotation Artifacts in Natural Language Inference Data

no code implementations • NAACL 2018 • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, Noah A. Smith

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to.

Natural Language Inference Negation +2

Paper
Add Code

Learning Joint Semantic Parsers from Disjoint Data

2 code implementations • NAACL 2018 • Hao Peng, Sam Thomson, Swabha Swayamdipta, Noah A. Smith

We present a new approach to learning semantic parsers from multiple datasets, even when the target semantic formalisms are drastically different, and the underlying corpora do not overlap.

Dependency Parsing Semantic Dependency Parsing

Paper
Code

Parsing Tweets into Universal Dependencies

1 code implementation • NAACL 2018 • Yijia Liu, Yi Zhu, Wanxiang Che, Bing Qin, Nathan Schneider, Noah A. Smith

Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD.

Ranked #2 on Dependency Parsing on Tweebank

Computational Efficiency Dependency Parsing +1

Paper
Code

Sounding Board: A User-Centric and Content-Driven Social Chatbot

no code implementations • NAACL 2018 • Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, Mari Ostendorf

We present Sounding Board, a social chatbot that won the 2017 Amazon Alexa Prize.

Chatbot Dialogue Management +2

Paper
Add Code

Backpropagating through Structured Argmax using a SPIGOT

1 code implementation • ACL 2018 • Hao Peng, Sam Thomson, Noah A. Smith

We introduce the structured projection of intermediate gradients optimization technique (SPIGOT), a new method for backpropagating through neural networks that include hard-decision structured predictions (e. g., parsing) in intermediate layers.

Dependency Parsing Semantic Dependency Parsing +2

Paper
Code

SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines

2 code implementations • 15 May 2018 • Roy Schwartz, Sam Thomson, Noah A. Smith

Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural language utterances.

Explainable artificial intelligence General Classification +3

Paper
Code

Event2Mind: Commonsense Inference on Events, Intents, and Reactions

no code implementations • ACL 2018 • Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A. Smith, Yejin Choi

We investigate a new commonsense inference task: given an event described in a short free-form text ("X drinks coffee in the morning"), a system reasons about the likely intents ("X wants to stay awake") and reactions ("X feels alert") of the event's participants.

Ranked #1 on Common Sense Reasoning on Event2Mind test

Common Sense Reasoning

Paper
Add Code

Toward Abstractive Summarization Using Semantic Representations

1 code implementation • HLT 2015 • Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, Noah A. Smith

We present a novel abstractive summarization framework that draws on the recent development of a treebank for the Abstract Meaning Representation (AMR).

Abstractive Text Summarization

Paper
Code

LSTMs Exploit Linguistic Attributes of Data

no code implementations • WS 2018 • Nelson F. Liu, Omer Levy, Roy Schwartz, Chenhao Tan, Noah A. Smith

While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data.

Memorization Open-Ended Question Answering

Paper
Add Code

Discovering Phonesthemes with Sparse Regularization

no code implementations • WS 2018 • Nelson F. Liu, Gina-Anne Levow, Noah A. Smith

We introduce a simple method for extracting non-arbitrary form-meaning representations from a collection of semantic vectors.

feature selection

Paper
Add Code

The Importance of Calibration for Estimating Proportions from Annotations

no code implementations • NAACL 2018 • Dallas Card, Noah A. Smith

Estimating label proportions in a target corpus is a type of measurement that is useful for answering certain types of social-scientific questions.

Sentiment Analysis Text Categorization

Paper
Add Code

Neural Text Generation in Stories Using Entity Representations as Context

no code implementations • NAACL 2018 • Elizabeth Clark, Yangfeng Ji, Noah A. Smith

We introduce an approach to neural text generation that explicitly represents entities mentioned in the text.

Dialogue Generation Representation Learning +1

Paper
Add Code

Bridging CNNs, RNNs, and Weighted Finite-State Machines

no code implementations • ACL 2018 • Roy Schwartz, Sam Thomson, Noah A. Smith

Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural language utterances.

General Classification Representation Learning +3

Paper
Add Code

Semantic Matching Against a Corpus: New Applications and Methods

no code implementations • 28 Aug 2018 • Lucy H. Lin, Scott Miles, Noah A. Smith

We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection.

Paper
Add Code

Rational Recurrences

1 code implementation • EMNLP 2018 • Hao Peng, Roy Schwartz, Sam Thomson, Noah A. Smith

We characterize this connection formally, defining rational recurrences to be recurrent hidden state update functions that can be written as the Forward calculation of a finite set of WFSAs.

Language Modelling text-classification +1

Paper
Code

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

1 code implementation • EMNLP 2018 • Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell

To improve robustness to word order differences, we propose to use self-attention, which allows for a degree of flexibility with respect to word order.

named-entity-recognition Named Entity Recognition +2

Paper
Code

Syntactic Scaffolds for Semantic Structures

1 code implementation • EMNLP 2018 • Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, Noah A. Smith

We introduce the syntactic scaffold, an approach to incorporating syntactic information into semantic tasks.

coreference-resolution

Paper
Code

ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

2 code implementations • 31 Oct 2018 • Maarten Sap, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, Yejin Choi

We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge.

Relation

Paper
Code

You May Not Need Attention

1 code implementation • 31 Oct 2018 • Ofir Press, Noah A. Smith

In NMT, how far can we get without attention and without separate encoding and decoding?

NMT Translation

292

Paper
Code

Deep Weighted Averaging Classifiers

2 code implementations • 6 Nov 2018 • Dallas Card, Michael Zhang, Noah A. Smith

Recent advances in deep learning have achieved impressive gains in classification accuracy on a variety of types of data, including images and text.

General Classification

Paper
Code

Contextual Word Representations: A Contextual Introduction

3 code implementations • 15 Feb 2019 • Noah A. Smith

This introduction aims to tell the story of how we put words into computers.

Question Answering Translation +1

186

Paper
Code

Polyglot Contextual Representations Improve Crosslingual Transfer

1 code implementation • NAACL 2019 • Phoebe Mulcaire, Jungo Kasai, Noah A. Smith

We introduce Rosita, a method to produce multilingual contextual word representations by training a single language model on text from multiple languages.

Dependency Parsing Language Modelling +5

Paper
Code

Measuring Online Debaters' Persuasive Skill from Text over Time

no code implementations • TACL 2019 • Kelvin Luu, Chenhao Tan, Noah A. Smith

We build on a widely used model of skill in two-player games and augment it with linguistic features of a debater{'}s content.

Paper
Add Code

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

no code implementations • WS 2019 • Matthew E. Peters, Sebastian Ruder, Noah A. Smith

While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task.

Transfer Learning

Paper
Add Code

Linguistic Knowledge and Transferability of Contextual Representations

no code implementations • NAACL 2019 • Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith

Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language.

Language Modelling

Paper
Add Code

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets

no code implementations • NAACL 2019 • Nelson F. Liu, Roy Schwartz, Noah A. Smith

Several datasets have recently been constructed to expose brittleness in models trained on existing benchmarks.

Paper
Add Code

Evaluating Gender Bias in Machine Translation

1 code implementation • ACL 2019 • Gabriel Stanovsky, Noah A. Smith, Luke Zettlemoyer

We present the first challenge set and evaluation protocol for the analysis of gender bias in machine translation (MT).

coreference-resolution Machine Translation +2

Paper
Code

Variational Pretraining for Semi-supervised Text Classification

1 code implementation • ACL 2019 • Suchin Gururangan, Tam Dang, Dallas Card, Noah A. Smith

We accompany this paper with code to pretrain and use VAMPIRE embeddings in downstream tasks.

General Classification Semi-Supervised Text Classification

175

Paper
Code

Is Attention Interpretable?

1 code implementation • ACL 2019 • Sofia Serrano, Noah A. Smith

Attention mechanisms have recently boosted performance on a range of NLP tasks.

General Classification text-classification +1

Paper
Code

The Risk of Racial Bias in Hate Speech Detection

no code implementations • ACL 2019 • Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, Noah A. Smith

We investigate how annotators{'} insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations.

Hate Speech Detection

Paper
Add Code

Sentence Mover's Similarity: Automatic Evaluation for Multi-Sentence Texts

no code implementations • ACL 2019 • Elizabeth Clark, Asli Celikyilmaz, Noah A. Smith

For evaluating machine-generated texts, automatic methods hold the promise of avoiding collection of human judgments, which can be expensive and time-consuming.

Paper
Add Code

Green AI

2 code implementations • 22 Jul 2019 • Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni

Moreover, the financial cost of the computations can make it difficult for academics, students, and researchers, in particular those from emerging economies, to engage in deep learning research.

4,761

Paper
Code

Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning

1 code implementation • IJCNLP 2019 • Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner

Machine comprehension of texts longer than a single sentence often requires coreference resolution.

coreference-resolution Reading Comprehension +1

Paper
Code

Shallow Syntax in Deep Water

no code implementations • 29 Aug 2019 • Swabha Swayamdipta, Matthew Peters, Brendan Roof, Chris Dyer, Noah A. Smith

Shallow syntax provides an approximation of phrase-syntactic structure of sentences; it can be produced with high accuracy, and is computationally cheap to obtain.

Paper
Add Code

Topics to Avoid: Demoting Latent Confounds in Text Classification

1 code implementation • IJCNLP 2019 • Sachin Kumar, Shuly Wintner, Noah A. Smith, Yulia Tsvetkov

Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well.

General Classification Native Language Identification +2

Paper
Code

PaLM: A Hybrid Parser and Language Model

1 code implementation • IJCNLP 2019 • Hao Peng, Roy Schwartz, Noah A. Smith

We present PaLM, a hybrid parser and neural language model.

Language Modelling

Paper
Code

RNN Architecture Learning with Sparse Regularization

1 code implementation • IJCNLP 2019 • Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith

Our method also highlights the interpretable properties of rational RNNs.

Sentiment Analysis

Paper
Code

Show Your Work: Improved Reporting of Experimental Results

4 code implementations • IJCNLP 2019 • Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e. g., accuracy) on held-out test data, compared to previous results.

2,093

Paper
Code

Knowledge Enhanced Contextual Word Representations

1 code implementation • IJCNLP 2019 • Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith

Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world entities and are often unable to remember facts about those entities.

Ranked #9 on Relation Classification on TACRED

Entity Linking Entity Typing +3

362

Paper
Code

Improving Natural Language Inference with a Pretrained Parser

1 code implementation • 18 Sep 2019 • Deric Pang, Lucy H. Lin, Noah A. Smith

We introduce a novel approach to incorporate syntax into natural language inference (NLI) models.

Natural Language Inference

Paper
Code

Low-Resource Parsing with Crosslingual Contextualized Representations

no code implementations • CONLL 2019 • Phoebe Mulcaire, Jungo Kasai, Noah A. Smith

Despite advances in dependency parsing, languages with small treebanks still present challenges.

Dependency Parsing

Paper
Add Code

Situating Sentence Embedders with Nearest Neighbor Overlap

no code implementations • ICLR 2020 • Lucy H. Lin, Noah A. Smith

As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words have come to play an increasingly important role.

Sentence

Paper
Add Code

Social Bias Frames: Reasoning about Social and Power Implications of Language

no code implementations • ACL 2020 • Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, Yejin Choi

We introduce Social Bias Frames, a new conceptual formalism that aims to model the pragmatic frames in which people project social biases and stereotypes onto others.

Paper
Add Code

Improving Transformer Models by Reordering their Sublayers

2 code implementations • ACL 2020 • Ofir Press, Noah A. Smith, Omer Levy

Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers.

Ranked #7 on Language Modelling on enwik8

Language Modelling Machine Translation +1

Paper
Code

On Consequentialism and Fairness

no code implementations • 2 Jan 2020 • Dallas Card, Noah A. Smith

In this paper we provide a consequentialist critique of common definitions of fairness within machine learning, as well as a machine learning perspective on consequentialism.

BIG-bench Machine Learning Decision Making +2

Paper
Add Code

Explaining Relationships Between Scientific Documents

1 code implementation • ACL 2021 • Kelvin Luu, Xinyi Wu, Rik Koncel-Kedziorski, Kyle Lo, Isabel Cachola, Noah A. Smith

We address the task of explaining relationships between two scientific documents using natural language text.

Language Modelling Large Language Model +1

Paper
Code

Multi-View Learning for Vision-and-Language Navigation

no code implementations • 2 Mar 2020 • Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Jianfeng Gao, Yejin Choi, Noah A. Smith

Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified.

MULTI-VIEW LEARNING Navigate +1

Paper
Add Code

Evaluating Models' Local Decision Boundaries via Contrast Sets

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou

Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.

Reading Comprehension Sentiment Analysis

Paper
Code

The Right Tool for the Job: Matching Model and Instance Complexities

1 code implementation • ACL 2020 • Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith

Our method presents a favorable speed/accuracy tradeoff in almost all cases, producing models which are up to five times faster than the state of the art, while preserving their accuracy.

Natural Language Inference text-classification +1

Paper
Code

A Formal Hierarchy of RNN Architectures

no code implementations • ACL 2020 • William Merrill, Gail Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, Eran Yahav

While formally extending these findings to unsaturated RNNs is left to future work, we hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy.

Paper
Add Code

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

6 code implementations • ACL 2020 • Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith

Language models pretrained on text from a wide variety of sources form the foundation of today's NLP.

Citation Intent Classification

519

Paper
Code

A Mixture of $h-1$ Heads is Better than $h$ Heads

no code implementations • 13 May 2020 • Hao Peng, Roy Schwartz, Dianqi Li, Noah A. Smith

Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks.

Language Modelling Machine Translation +1

Paper
Add Code

Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction

no code implementations • CL 2020 • Marta R. Costa-juss{\`a}, Cristina Espa{\~n}a-Bonet, Pascale Fung, Noah A. Smith

We introduce the Computational Linguistics special issue on Multilingual and Interlingual Semantic Representations for Natural Language Processing.

Paper
Add Code

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

2 code implementations • ICLR 2021 • Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith

We show that the speed disadvantage for autoregressive baselines compared to non-autoregressive methods has been overestimated in three aspects: suboptimal layer allocation, insufficient speed measurement, and lack of knowledge distillation.

Knowledge Distillation Machine Translation +1

Paper
Code

Exploring the Effect of Author and Reader Identity in Online Story Writing: the STORIESINTHEWILD Corpus.

no code implementations • WS 2020 • Tal August, Maarten Sap, Elizabeth Clark, Katharina Reinecke, Noah A. Smith

We analyze the effect of author and reader characteristics and story writing setup on the quality of stories in a short storytelling task.

Paper
Add Code

A Mixture of h - 1 Heads is Better than h Heads

no code implementations • ACL 2020 • Hao Peng, Roy Schwartz, Dianqi Li, Noah A. Smith

Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks.

Language Modelling Machine Translation +1

Paper
Add Code

Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models

no code implementations • ACL 2020 • Maarten Sap, Eric Horvitz, Yejin Choi, Noah A. Smith, James Pennebaker

We introduce a measure of narrative flow and use this to examine the narratives for imagined and recalled events.

Paper
Add Code

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

6 code implementations • EMNLP 2020 • Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, Yejin Choi

Experiments across four datasets show that these model-dependent measures reveal three distinct regions in the data map, each with pronounced characteristics.

Model Optimization Out-of-Distribution Generalization

183

Paper
Code

Grounded Compositional Outputs for Adaptive Language Modeling

1 code implementation • EMNLP 2020 • Nikolaos Pappas, Phoebe Mulcaire, Noah A. Smith

To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.

Language Modelling

Paper
Code

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

2 code implementations • Findings of the Association for Computational Linguistics 2020 • Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith

We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.

Sentence Text Generation

163

Paper
Code

Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Ethan C. Chau, Lucy H. Lin, Noah A. Smith

Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties.

Dependency Parsing

Paper
Code

Evaluating NLP Models via Contrast Sets

no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou

Reading Comprehension Sentiment Analysis

Paper
Add Code

Multilevel Text Alignment with Cross-Document Attention

1 code implementation • EMNLP 2020 • Xuhui Zhou, Nikolaos Pappas, Noah A. Smith

Text alignment finds application in tasks such as citation recommendation and plagiarism detection.

Citation Recommendation Sentence

Paper
Code

Plug and Play Autoencoders for Conditional Text Generation

1 code implementation • EMNLP 2020 • Florian Mai, Nikolaos Pappas, Ivan Montero, Noah A. Smith, James Henderson

Text autoencoders are commonly used for conditional generation tasks such as style transfer.

Conditional Text Generation Navigate +1

Paper
Code

The Multilingual Amazon Reviews Corpus

1 code implementation • EMNLP 2020 • Phillip Keung, Yichao Lu, György Szarvas, Noah A. Smith

We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale collection of Amazon reviews for multilingual text classification.

General Classification Multilingual text classification +4

Paper
Code

Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings

no code implementations • 15 Oct 2020 • Phillip Keung, Julian Salazar, Yichao Lu, Noah A. Smith

We then improve an XLM-based unsupervised neural MT system pre-trained on Wikipedia by supplementing it with pseudo-parallel text mined from the same corpus, boosting unsupervised translation performance by up to 3. 5 BLEU on the WMT'14 French-English and WMT'16 German-English tasks and outperforming the previous state-of-the-art.

Machine Translation Sentence +2

Paper
Add Code

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A. Smith, Yejin Choi

Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights.

Language Modelling Natural Language Inference +4

Paper
Code

Measuring Association Between Labels and Free-Text Rationales

1 code implementation • EMNLP 2021 • Sarah Wiegreffe, Ana Marasović, Noah A. Smith

In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance.

Decision Making Feature Importance +2

Paper
Code

Thinking Like a Skeptic: Defeasible Inference in Natural Language

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Rachel Rudinger, Vered Shwartz, Jena D. Hwang, Chandra Bhagavatula, Maxwell Forbes, Ronan Le Bras, Noah A. Smith, Yejin Choi

Defeasible inference is a mode of reasoning in which an inference (X is a bird, therefore X flies) may be weakened or overturned in light of new evidence (X is a penguin).

Common Sense Reasoning Natural Language Inference +1

Paper
Code

Infusing Finetuning with Semantic Dependencies

1 code implementation • 10 Dec 2020 • Zhaofeng Wu, Hao Peng, Noah A. Smith

For natural language processing systems, two kinds of evidence support the use of text representations from neural language models "pretrained" on large unannotated corpora: performance on application-inspired benchmarks (Peters et al., 2018, inter alia), and the emergence of syntactic abstractions in those representations (Tenney et al., 2019, inter alia).

Natural Language Understanding

Paper
Code

Shortformer: Better Language Modeling using Shorter Inputs

1 code implementation • ACL 2021 • Ofir Press, Noah A. Smith, Mike Lewis

Increasing the input length has been a driver of progress in language modeling with transformers.

Ranked #26 on Language Modelling on WikiText-103

Language Modelling Position +1

146

Paper
Code

GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation

2 code implementations • 17 Jan 2021 • Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A. Smith, Daniel S. Weld

While often assumed a gold standard, effective human evaluation of text generation remains an important, open area for research.

Machine Translation Reading Comprehension +2

Paper
Code

Challenges in Automated Debiasing for Toxic Language Detection

2 code implementations • EACL 2021 • Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases.

Fairness text-classification +1

Paper
Code

Random Feature Attention

no code implementations • ICLR 2021 • Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong

RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism.

Ranked #27 on Machine Translation on IWSLT2014 German-English

Language Modelling Machine Translation +3

Paper
Add Code

Finetuning Pretrained Transformers into RNNs

1 code implementation • EMNLP 2021 • Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith

Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune.

Ranked #2 on Machine Translation on WMT2017 Chinese-English

Language Modelling Machine Translation +1

Paper
Code

Probing Across Time: What Does RoBERTa Know and When?

1 code implementation • Findings (EMNLP) 2021 • Leo Z. Liu, Yizhong Wang, Jungo Kasai, Hannaneh Hajishirzi, Noah A. Smith

Models of language trained on very large corpora have been demonstrated useful for NLP.

Language Modelling

Paper
Code

Competency Problems: On Finding and Removing Artifacts in Language Data

no code implementations • EMNLP 2021 • Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith

In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems.

Negation

Paper
Add Code

Go Forth and Prosper: Language Modeling with Ancient Textual History

1 code implementation • 18 Apr 2021 • Rik Koncel-Kedziorski, Noah A. Smith

This method can improve perplexity of pretrained LMs with no updates to the LM's own parameters.

Language Modelling

Paper
Code

Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?

no code implementations • 22 Apr 2021 • William Merrill, Yoav Goldberg, Roy Schwartz, Noah A. Smith

We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.

Paper
Add Code

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

1 code implementation • ACL 2021 • Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, Yejin Choi

Despite recent advances in natural language generation, it remains challenging to control attributes of generated text.

Language Modelling Text Generation

104

Paper
Code

A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

1 code implementation • NAACL 2021 • Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt Gardner

Readers of academic research papers often read with the goal of answering specific questions.

Ranked #1 on Evidence Selection on QASPER

Evidence Selection Question Answering

Paper
Code

Choose Your Own Adventure: Paired Suggestions in Collaborative Writing for Evaluating Story Generation Models

no code implementations • NAACL 2021 • Elizabeth Clark, Noah A. Smith

Story generation is an open-ended and subjective task, which poses a challenge for evaluating story generation models.

Story Generation

Paper
Add Code

Specializing Multilingual Language Models: An Empirical Study

1 code implementation • EMNLP (MRL) 2021 • Ethan C. Chau, Noah A. Smith

Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations.

Dependency Parsing named-entity-recognition +5

Paper
Code

Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study

1 code implementation • AKBC 2021 • Rahul Nadkarni, David Wadden, Iz Beltagy, Noah A. Smith, Hannaneh Hajishirzi, Tom Hope

Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes.

Knowledge Base Completion Knowledge Graphs +1

Paper
Code

Saturated Transformers are Constant-Depth Threshold Circuits

no code implementations • 30 Jun 2021 • William Merrill, Ashish Sabharwal, Noah A. Smith

Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages.

Hard Attention

Paper
Add Code

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

no code implementations • 30 Jun 2021 • Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith

Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge machine-generated text?

nlg evaluation Text Generation

Paper
Add Code

Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text

no code implementations • ACL 2022 • Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A. Smith, Yejin Choi

To support the broad range of real machine errors that can be identified by laypeople, the ten error categories of Scarecrow -- such as redundancy, commonsense errors, and incoherence -- are identified through several rounds of crowd annotation experiments without a predefined ontology.

Math Text Generation

Paper
Add Code

All That's `Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

no code implementations • ACL 2021 • Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith

Human evaluations are typically considered the gold standard in natural language generation, but as models{'} fluency improves, how well can evaluators detect and judge machine-generated text?

nlg evaluation Text Generation

Paper
Add Code

DEMix Layers: Disentangling Domains for Modular Language Modeling

2 code implementations • NAACL 2022 • Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer

We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text.

Language Modelling

Paper
Code

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

7 code implementations • ICLR 2022 • Ofir Press, Noah A. Smith, Mike Lewis

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training?

Inductive Bias Playing the Game of 2048 +2

47,594

Paper
Code

Sentence Bottleneck Autoencoders from Transformer Language Models

1 code implementation • EMNLP 2021 • Ivan Montero, Nikolaos Pappas, Noah A. Smith

Representation learning for text via pretraining a language model on a large corpus has become a standard starting point for building NLP systems.

Denoising Language Modelling +6

Paper
Code

Expected Validation Performance and Estimation of a Random Variable's Maximum

no code implementations • 1 Oct 2021 • Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.

Paper
Add Code

ABC: Attention with Bounded-memory Control

no code implementations • ACL 2022 • Hao Peng, Jungo Kasai, Nikolaos Pappas, Dani Yogatama, Zhaofeng Wu, Lingpeng Kong, Roy Schwartz, Noah A. Smith

One way to improve the efficiency is to bound the memory size.

Language Modelling Machine Translation

Paper
Add Code

Time Waits for No One! Analysis and Challenges of Temporal Misalignment

1 code implementation • NAACL 2022 • Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, Noah A. Smith

When an NLP model is trained on text data from one time period and tested or deployed on data from another, the resulting temporal misalignment can degrade end-task performance.

Paper
Code

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

no code implementations • NAACL 2022 • Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. Smith

The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases.

Paper
Add Code

Transparent Human Evaluation for Image Captioning

2 code implementations • NAACL 2022 • Jungo Kasai, Keisuke Sakaguchi, Lavinia Dunagan, Jacob Morrison, Ronan Le Bras, Yejin Choi, Noah A. Smith

We establish THumB, a rubric-based human evaluation protocol for image captioning models.

Image Captioning

Paper
Code

Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

2 code implementations • NAACL 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. Smith

We therefore propose a generalization of leaderboards, bidimensional leaderboards (Billboards), that simultaneously tracks progress in language generation models and metrics for their evaluation.

Image Captioning Machine Translation +1

Paper
Code

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

1 code implementation • NAACL 2022 • Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin Choi

To enable constrained generation, we build on NeuroLogic decoding (Lu et al., 2021), combining its flexibility in incorporating logical constraints with A*esque estimates of future constraint satisfaction.

Ranked #1 on Text Generation on ROCStories

Machine Translation Table-to-Text Generation

Paper
Code

Imagined versus Remembered Stories: Quantifying Differences in Narrative Flow

no code implementations • 7 Jan 2022 • Maarten Sap, Anna Jafarpour, Yejin Choi, Noah A. Smith, James W. Pennebaker, Eric Horvitz

We quantify the differences between autobiographical and imagined stories by introducing sequentiality, a measure of narrative flow of events, drawing probabilistic inferences from a cutting-edge large language model (GPT-3).

Language Modelling Large Language Model +2

Paper
Add Code

WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

1 code implementation • 16 Jan 2022 • Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

Starting with an existing dataset, MultiNLI for natural language inference (NLI), our approach uses dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instructs GPT-3 to compose new examples with similar patterns.

Natural Language Inference Text Generation

Paper
Code

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

1 code implementation • 16 Jan 2022 • Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu

Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases.

Ranked #1 on Task-Oriented Dialogue Systems on KVRET

Few-Shot Learning Question Answering +3

530

Paper
Code

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

no code implementations • 25 Jan 2022 • Suchin Gururangan, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith

Language models increasingly rely on massive web dumps for diverse text data.

Language Modelling

Paper
Add Code

In-Context Learning for Few-Shot Dialogue State Tracking

1 code implementation • 16 Mar 2022 • Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf

In this work, we propose an in-context learning (ICL) framework for zero-shot and few-shot learning DST, where a large pre-trained language model (LM) takes a test instance and a few exemplars as input, and directly decodes the dialogue state without any parameter updates.

Dialogue State Tracking Few-Shot Learning +3

Paper
Code

A Call for Clarity in Beam Search: How It Works and When It Stops

1 code implementation • 11 Apr 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Dragomir Radev, Yejin Choi, Noah A. Smith

Based on this finding, we introduce a patience factor, a simple modification to this beam decoding implementation, that generalizes the stopping criterion and provides flexibility to the depth of search.

Machine Translation Text Generation +2

Paper
Code

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

7 code implementations • 16 Apr 2022 • Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, Siddhartha Mishra, Sujan Reddy, Sumanta Patro, Tanay Dixit, Xudong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi

This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones.

Benchmarking Instruction Following

895

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.