Search Results for author: Samuel R. Bowman

Found 76 papers, 37 papers with code

Crowdsourcing Beyond Annotation: Case Studies in Benchmark Data Collection

no code implementations EMNLP (ACL) 2021 Alane Suhr, Clara Vania, Nikita Nangia, Maarten Sap, Mark Yatskar, Samuel R. Bowman, Yoav Artzi

Even though it is such a fundamental tool in NLP, crowdsourcing use is largely guided by common practices and the personal experience of researchers.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

1 code implementation9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramón Risco Delgado, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Timothy Telleen-Lawton, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way

1 code implementation23 May 2022 Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman

Summarization datasets are often assembled either by scraping naturally occurring public-domain summaries -- which are nearly always in difficult-to-work-with technical domains -- or by using approximate heuristics to extract them from everyday text -- which frequently yields unfaithful summaries.

Document Summarization Multiple-choice

Instruction Induction: From Few Examples to Natural Language Task Descriptions

1 code implementation22 May 2022 Or Honovich, Uri Shaham, Samuel R. Bowman, Omer Levy

Large language models are able to perform a task by conditioning on a few input-output demonstrations - a paradigm known as in-context learning.

Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

no code implementations LNLS (ACL) 2022 Alicia Parrish, Harsh Trivedi, Ethan Perez, Angelica Chen, Nikita Nangia, Jason Phang, Samuel R. Bowman

We use long contexts -- humans familiar with the context write convincing explanations for pre-selected correct and incorrect answers, and we test if those explanations allow humans who have not read the full context to more accurately determine the correct answer.

Multiple-choice Reading Comprehension

What Makes Reading Comprehension Questions Difficult?

1 code implementation ACL 2022 Saku Sugawara, Nikita Nangia, Alex Warstadt, Samuel R. Bowman

For a natural language understanding benchmark to be useful in research, it has to consist of examples that are diverse and difficult enough to discriminate among current and near-future state-of-the-art systems.

Multiple-choice Natural Language Understanding +1

QuALITY: Question Answering with Long Input Texts, Yes!

2 code implementations16 Dec 2021 Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel R. Bowman

To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5, 000 tokens, much longer than typical current models can process.

Multiple-choice Multiple Choice Question Answering (MCQA)

Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair

no code implementations16 Nov 2021 Jason Phang, Angelica Chen, William Huang, Samuel R. Bowman

We find that AFLite indeed selects more challenging examples, lowering the performance of evaluated models more as stronger adversary models are used.

The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail

no code implementations15 Oct 2021 Samuel R. Bowman

Researchers in NLP often frame and discuss research results in ways that serve to deemphasize the field's successes, often in response to the field's widespread hype.

Clean or Annotate: How to Spend a Limited Data Collection Budget

no code implementations15 Oct 2021 Derek Chen, Zhou Yu, Samuel R. Bowman

Crowdsourcing platforms are often used to collect datasets for training machine learning models, despite higher levels of inaccurate labeling compared to expert labeling.

Denoising Learning with noisy labels +1

BBQ: A Hand-Built Bias Benchmark for Question Answering

1 code implementation Findings (ACL) 2022 Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, Samuel R. Bowman

It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA).

Question Answering

Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers

no code implementations EMNLP (BlackboxNLP) 2021 Jason Phang, Haokun Liu, Samuel R. Bowman

Despite the success of fine-tuning pretrained language encoders like BERT for downstream natural language understanding (NLU) tasks, it is still poorly understood how neural networks change after fine-tuning.

Natural Language Understanding

NOPE: A Corpus of Naturally-Occurring Presuppositions in English

1 code implementation CoNLL (EMNLP) 2021 Alicia Parrish, Sebastian Schuster, Alex Warstadt, Omar Agha, Soo-Hwan Lee, Zhuoye Zhao, Samuel R. Bowman, Tal Linzen

Understanding language requires grasping not only the overtly stated content, but also making inferences about things that were left unsaid.

Comparing Test Sets with Item Response Theory

no code implementations ACL 2021 Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman

Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks.

Natural Language Understanding

What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?

1 code implementation ACL 2021 Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman

However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data.

Multiple-choice Natural Language Understanding +1

Does Putting a Linguist in the Loop Improve NLU Data Collection?

no code implementations Findings (EMNLP) 2021 Alicia Parrish, William Huang, Omar Agha, Soo-Hwan Lee, Nikita Nangia, Alex Warstadt, Karmanya Aggarwal, Emily Allaway, Tal Linzen, Samuel R. Bowman

We take natural language inference as a test case and ask whether it is beneficial to put a linguist `in the loop' during data collection to dynamically identify and address gaps in the data by introducing novel constraints on the task.

Natural Language Inference

What Will it Take to Fix Benchmarking in Natural Language Understanding?

no code implementations NAACL 2021 Samuel R. Bowman, George E. Dahl

Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and biased systems score so highly on standard benchmarks that there is little room for researchers who develop better systems to demonstrate their improvements.

Natural Language Understanding

When Do You Need Billions of Words of Pretraining Data?

1 code implementation ACL 2021 Yian Zhang, Alex Warstadt, Haau-Sing Li, Samuel R. Bowman

We adopt four probing methods---classifier probing, information-theoretic probing, unsupervised relative acceptability judgment, and fine-tuning on NLU tasks---and draw learning curves that track the growth of these different measures of linguistic ability with respect to pretraining data volume using the MiniBERTas, a group of RoBERTa models pretrained on 1M, 10M, 100M and 1B words.

Pretrained Language Models

Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options

1 code implementation Asian Chapter of the Association for Computational Linguistics 2020 Clara Vania, Ruijie Chen, Samuel R. Bowman

Using these protocols and a writing-based baseline, we collect several new English NLI datasets of over 3k examples each, each using a fixed amount of annotator time, but a varying number of examples to fit that time budget.

Natural Language Inference Transfer Learning

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

1 code implementation EMNLP 2020 Alex Warstadt, Yian Zhang, Haau-Sing Li, Haokun Liu, Samuel R. Bowman

One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding.

Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data

1 code implementation EMNLP (insights) 2020 William Huang, Haokun Liu, Samuel R. Bowman

A growing body of work shows that models exploit annotation artifacts to achieve state-of-the-art performance on standard crowdsourced benchmarks---datasets collected from crowdworkers to create an evaluation task---while still failing on out-of-domain examples for the same task.

Natural Language Inference Natural Language Understanding +1

Precise Task Formalization Matters in Winograd Schema Evaluations

1 code implementation EMNLP 2020 Haokun Liu, William Huang, Dhara A. Mungra, Samuel R. Bowman

Performance on the Winograd Schema Challenge (WSC), a respected English commonsense reasoning benchmark, recently rocketed from chance accuracy to 89% on the SuperGLUE leaderboard, with relatively little corroborating evidence of a correspondingly large improvement in reasoning ability.

Language Modelling Multiple-choice

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

1 code implementation EMNLP 2020 Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman

To measure some forms of social bias in language models against protected demographic groups in the US, we introduce the Crowdsourced Stereotype Pairs benchmark (CrowS-Pairs).

Pretrained Language Models

Can neural networks acquire a structural bias from raw linguistic data?

no code implementations14 Jul 2020 Alex Warstadt, Samuel R. Bowman

We argue that these results are the strongest evidence so far from artificial learners supporting the proposition that a structural bias can be acquired from raw data.

Inductive Bias Language Acquisition

Self-Training for Unsupervised Parsing with PRPN

no code implementations WS 2020 Anhad Mohananey, Katharina Kann, Samuel R. Bowman

To be able to use our model's predictions during training, we extend a recent neural UP architecture, the PRPN (Shen et al., 2018a) such that it can be trained in a semi-supervised fashion.

Language Modelling

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings.

Question Answering Zero-Shot Cross-Lingual Transfer

Learning to Learn Morphological Inflection for Resource-Poor Languages

no code implementations28 Apr 2020 Katharina Kann, Samuel R. Bowman, Kyunghyun Cho

We propose to cast the task of morphological inflection - mapping a lemma to an indicated inflected form - for resource-poor languages as a meta-learning problem.

Cross-Lingual Transfer Meta-Learning +1

New Protocols and Negative Results for Textual Entailment Data Collection

1 code implementation EMNLP 2020 Samuel R. Bowman, Jennimaria Palomaki, Livio Baldini Soares, Emily Pitler

Natural language inference (NLI) data has proven useful in benchmarking and, especially, as pretraining data for tasks requiring language understanding.

Natural Language Inference Transfer Learning

BLiMP: The Benchmark of Linguistic Minimal Pairs for English

3 code implementations TACL 2020 Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, Samuel R. Bowman

We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English.

Do Attention Heads in BERT Track Syntactic Dependencies?

1 code implementation27 Nov 2019 Phu Mon Htut, Jason Phang, Shikha Bordia, Samuel R. Bowman

We investigate the extent to which individual attention heads in pretrained transformer language models, such as BERT and RoBERTa, implicitly capture syntactic dependency relations.

Neural Unsupervised Parsing Beyond English

no code implementations WS 2019 Katharina Kann, Anhad Mohananey, Samuel R. Bowman, Kyunghyun Cho

Recently, neural network models which automatically infer syntactic structure from raw text have started to achieve promising results.

Inducing Constituency Trees through Neural Machine Translation

no code implementations22 Sep 2019 Phu Mon Htut, Kyunghyun Cho, Samuel R. Bowman

Latent tree learning(LTL) methods learn to parse sentences using only indirect supervision from a downstream task.

Language Modelling Machine Translation +1

Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

no code implementations IJCNLP 2019 Katharina Kann, Kyunghyun Cho, Samuel R. Bowman

Here, we aim to answer the following questions: Does using a development set for early stopping in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages?

Natural Language Processing

Can Unconditional Language Models Recover Arbitrary Sentences?

no code implementations NeurIPS 2019 Nishant Subramani, Samuel R. Bowman, Kyunghyun Cho

We then investigate the conditions under which a language model can be made to generate a sentence through the identification of a point in such a space and find that it is possible to recover arbitrary sentences nearly perfectly with language models and representations of moderate size without modifying any model parameters.

Language Modelling Text Classification

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

no code implementations ACL 2019 Nikita Nangia, Samuel R. Bowman

The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70. 0 at launch to 83. 9, state of the art at the time of writing (May 24, 2019).

Sentence Classification

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

3 code implementations NeurIPS 2019 Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks.

Transfer Learning

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling

no code implementations ICLR 2019 Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen

Work on the problem of contextualized word representation—the development of reusable neural network components for sentence understanding—has recently seen a surge of progress centered on the unsupervised pretraining task of language modeling with methods like ELMo (Peters et al., 2018).

Language Modelling

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

no code implementations SEMEVAL 2019 Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, R. Thomas McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, Ellie Pavlick

Our results show that pretraining on language modeling performs the best on average across our probing tasks, supporting its widespread use for pretraining state-of-the-art NLP models, and CCG supertagging and NLI pretraining perform comparably.

CCG Supertagging Language Modelling +1

Identifying and Reducing Gender Bias in Word-Level Language Models

no code implementations NAACL 2019 Shikha Bordia, Samuel R. Bowman

Many text corpora exhibit socially problematic biases, which can be propagated or amplified in the models trained on such data.

Language Modelling

On Measuring Social Biases in Sentence Encoders

1 code implementation NAACL 2019 Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger

The Word Embedding Association Test shows that GloVe and word2vec word embeddings exhibit human-like implicit biases based on gender, race, and other social constructs (Caliskan et al., 2017).

Word Embeddings

Linguistic Analysis of Pretrained Sentence Encoders with Acceptability Judgments

no code implementations11 Jan 2019 Alex Warstadt, Samuel R. Bowman

We use this analysis set to investigate the grammatical knowledge of three pretrained encoders: BERT (Devlin et al., 2018), GPT (Radford et al., 2018), and the BiLSTM baseline from Warstadt et al. We find that these models have a strong command of complex or non-canonical argument structures like ditransitives (Sue gave Dan a book) and passives (The book was read).

General Classification Linguistic Acceptability

Verb Argument Structure Alternations in Word and Sentence Embeddings

no code implementations WS 2019 Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman

For converging evidence, we further construct LaVA, a corresponding word-level dataset, and investigate whether the same syntactic features can be extracted from word embeddings.

Sentence Embedding Sentence-Embedding +1

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

2 code implementations2 Nov 2018 Jason Phang, Thibault Févry, Samuel R. Bowman

Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks.

Language Modelling Natural Language Inference +1

Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis

no code implementations26 Sep 2018 Kelly W. Zhang, Samuel R. Bowman

We find that representations from language models consistently perform best on our syntactic auxiliary prediction tasks, even when trained on relatively small amounts of data.

Language Modelling Transfer Learning +1

Grammar Induction with Neural Language Models: An Unusual Replication

1 code implementation EMNLP (ACL) 2018 Phu Mon Htut, Kyunghyun Cho, Samuel R. Bowman

A substantial thread of recent work on latent tree learning has attempted to develop neural network models with parse-valued latent variables and train them on non-parsing tasks, in the hope of having them discover interpretable tree structure.

Constituency Parsing Language Modelling

Neural Network Acceptability Judgments

1 code implementation TACL 2019 Alex Warstadt, Amanpreet Singh, Samuel R. Bowman

This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence.

General Classification Language Acquisition +1

Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Task Analysis

no code implementations24 May 2018 Kelly W. Zhang, Samuel R. Bowman

There is mounting evidence that pretraining can be valuable for neural network language understanding models, but we do not yet have a clear understanding of how the choice of pretraining objective affects the type of linguistic information that models learn.

Language Modelling Transfer Learning +1

A Stable and Effective Learning Strategy for Trainable Greedy Decoding

1 code implementation EMNLP 2018 Yun Chen, Victor O. K. Li, Kyunghyun Cho, Samuel R. Bowman

Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation.

Machine Translation Translation

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

9 code implementations WS 2018 Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset.

Natural Language Inference Natural Language Understanding +1

ListOps: A Diagnostic Dataset for Latent Tree Learning

2 code implementations NAACL 2018 Nikita Nangia, Samuel R. Bowman

In this paper we introduce ListOps, a toy dataset created to study the parsing ability of latent tree models.

Sentence Classification

Annotation Artifacts in Natural Language Inference Data

no code implementations NAACL 2018 Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, Noah A. Smith

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to.

Natural Language Inference Text Categorization

The Lifted Matrix-Space Model for Semantic Composition

2 code implementations CONLL 2018 WooJin Chung, Sheng-Fu Wang, Samuel R. Bowman

Tree-structured neural network architectures for sentence encoding draw inspiration from the approach to semantic composition generally seen in formal linguistics, and have shown empirical improvements over comparable sequence models by doing so.

Semantic Composition Word Embeddings

Do latent tree learning models identify meaningful structure in sentences?

1 code implementation TACL 2018 Adina Williams, Andrew Drozdov, Samuel R. Bowman

Recent work on the problem of latent tree learning has made it possible to train neural networks that learn to both parse a sentence and use the resulting parse to interpret the sentence, all without exposure to ground-truth parse trees at training time.

Sentence Classification

The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations

no code implementations WS 2017 Nikita Nangia, Adina Williams, Angeliki Lazaridou, Samuel R. Bowman

This paper presents the results of the RepEval 2017 Shared Task, which evaluated neural network sentence representation learning models on the Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by Williams et al. (2017).

Natural Language Inference Representation Learning

Sequential Attention: A Context-Aware Alignment Function for Machine Reading

no code implementations WS 2017 Sebastian Brarda, Philip Yeres, Samuel R. Bowman

In this paper we propose a neural network model with a novel Sequential Attention layer that extends soft attention by assigning weights to words in an input sequence in a way that takes into account not just how well that word matches a query, but how well surrounding words match.

Reading Comprehension

Ruminating Reader: Reasoning with Gated Multi-Hop Attention

no code implementations WS 2018 Yichen Gong, Samuel R. Bowman

To answer the question in machine comprehension (MC) task, the models need to establish the interaction between the question and the context.

Question Answering Reading Comprehension

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

no code implementations23 Apr 2017 Yacine Jernite, Samuel R. Bowman, David Sontag

This work presents a novel objective function for the unsupervised training of neural network sentence encoders.

Representation Learning

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

2 code implementations NAACL 2018 Adina Williams, Nikita Nangia, Samuel R. Bowman

This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding.

Domain Adaptation Natural Language Inference

Generating Sentences from a Continuous Space

13 code implementations CONLL 2016 Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

The standard recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global sentence representation.

Language Modelling

A large annotated corpus for learning natural language inference

2 code implementations EMNLP 2015 Samuel R. Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning

Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations.

Image Captioning Natural Language Inference

Tree-structured composition in neural networks without tree-structured architectures

1 code implementation16 Jun 2015 Samuel R. Bowman, Christopher D. Manning, Christopher Potts

We hypothesize that neural sequence models like LSTMs are in fact able to discover and implicitly use recursive compositional structure, at least for tasks with clear cues to that structure in the data.

Learning Distributed Word Representations for Natural Logic Reasoning

no code implementations15 Oct 2014 Samuel R. Bowman, Christopher Potts, Christopher D. Manning

Natural logic offers a powerful relational conception of meaning that is a natural counterpart to distributed semantic representations, which have proven valuable in a wide range of sophisticated language tasks.

Tensor Networks

Recursive Neural Networks Can Learn Logical Semantics

no code implementations WS 2015 Samuel R. Bowman, Christopher Potts, Christopher D. Manning

Tree-structured recursive neural networks (TreeRNNs) for sentence meaning have been successful for many applications, but it remains an open question whether the fixed-length representations that they learn can support tasks as demanding as logical deduction.

Relational Reasoning Tensor Networks

Can recursive neural tensor networks learn logical reasoning?

1 code implementation21 Dec 2013 Samuel R. Bowman

Recursive neural network models and their accompanying vector representations for words have seen success in an array of increasingly semantically sophisticated tasks, but almost nothing is known about their ability to accurately capture the aspects of linguistic meaning that are necessary for interpretation or reasoning.

Tensor Networks

Cannot find the paper you are looking for? You can Submit a new open access paper.