1 code implementation • ACL 2022 • Stephen H. Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-David, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Alan Fries, Maged S. Al-shaibani, Shanya Sharma, Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev, Mike Tian-Jian Jiang, Alexander M. Rush
PromptSource is a system for creating, sharing, and using natural language prompts.
1 code implementation • NeurIPS 2021 • Justin T. Chiu, Yuntian Deng, Alexander M. Rush
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
no code implementations • 19 Oct 2021 • Hendrik Strobelt, Jambay Kinley, Robert Krueger, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush
These controls allow users to globally constrain model generations, without sacrificing the representation power of the deep learning models.
3 code implementations • ICLR 2022 • Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020).
2 code implementations • EMNLP 2021 • Keyon Vafa, Yuntian Deng, David M. Blei, Alexander M. Rush
Compared to existing baselines, greedy rationalization is best at optimizing the combinatorial objective and provides the most faithful rationales.
1 code implementation • EMNLP 2021 • François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush
Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models.
1 code implementation • EMNLP (ACL) 2021 • Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario Šaško, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugger, Clément Delangue, Théo Matussière, Lysandre Debut, Stas Bekman, Pierric Cistac, Thibault Goehringer, Victor Mustar, François Lagunas, Alexander M. Rush, Thomas Wolf
The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks.
1 code implementation • NAACL 2021 • Steven Cao, Victor Sanh, Alexander M. Rush
The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model's internal representations.
1 code implementation • NAACL 2021 • Teven Le Scao, Alexander M. Rush
When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction.
1 code implementation • 25 Feb 2021 • David Chiang, Alexander M. Rush, Boaz Barak
We propose a notation for tensors with named axes, which relieves the author, reader, and future implementers from the burden of keeping track of the order of axes and the purpose of each.
2 code implementations • ACL 2021 • Demi Guo, Alexander M. Rush, Yoon Kim
This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks.
no code implementations • ICLR 2021 • Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M. Rush
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task.
1 code implementation • NeurIPS 2020 • Yao Fu, Chuanqi Tan, Bin Bi, Mosha Chen, Yansong Feng, Alexander M. Rush
Learning to control the structure of sentences is a challenging problem in text generation.
no code implementations • 28 Nov 2020 • Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul N. Whatmough, Alexander M. Rush, David Brooks, Gu-Yeon Wei
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
1 code implementation • EMNLP 2020 • Demi Guo, Yoon Kim, Alexander M. Rush
Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language.
1 code implementation • EMNLP 2020 • Justin T. Chiu, Alexander M. Rush
The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure.
1 code implementation • EMNLP 2020 • Congzheng Song, Alexander M. Rush, Vitaly Shmatikov
We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models.
1 code implementation • 24 Oct 2020 • Sam Shleifer, Alexander M. Rush
A third, simpler approach is to 'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning.
2 code implementations • EACL 2021 • Xinya Du, Alexander M. Rush, Claire Cardie
We revisit the classic problem of document-level role-filler entity extraction (REE) for template filling.
1 code implementation • NeurIPS 2020 • Yuntian Deng, Alexander M. Rush
The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.
3 code implementations • NeurIPS 2020 • Victor Sanh, Thomas Wolf, Alexander M. Rush
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications.
2 code implementations • ACL 2020 • Xiang Lisa Li, Alexander M. Rush
In this work, we consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach.
1 code implementation • ACL 2020 • Noriyuki Kojima, Hadar Averbuch-Elor, Alexander M. Rush, Yoav Artzi
Visual features are a promising signal for learning bootstrap textual models.
1 code implementation • 13 Mar 2020 • Jiawei Zhou, Zhiying Xu, Alexander M. Rush, Minlan Yu
Botnets are now a major source for many network attacks, such as DDoS attacks and spam.
1 code implementation • ACL 2020 • Alexander M. Rush
The literature on structured prediction for NLP describes a rich collection of distributions and algorithms over sequences, segmentations, alignments, and trees; however, these algorithms are difficult to utilize in deep learning frameworks.
no code implementations • 13 Nov 2019 • Michael Lingzhi Li, Meng Dong, Jiawei Zhou, Alexander M. Rush
We derive theoretical results about the discriminative power and feature representation capabilities of each class.
7 code implementations • 9 Oct 2019 • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, Alexander M. Rush
Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks.
no code implementations • 8 Oct 2019 • Georgios A. Tritsaris, Yiqi Xie, Alexander M. Rush, Stephen Carr, Marios Mattheakis, Efthimios Kaxiras
Two-dimensional (2D) layered materials offer intriguing possibilities for novel physics and applications.
Materials Science
1 code implementation • IJCNLP 2019 • Zachary M. Ziegler, Yuntian Deng, Alexander M. Rush
Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal.
1 code implementation • IJCNLP 2019 • Joshua Feldman, Joe Davison, Alexander M. Rush
Inferring commonsense knowledge is a key challenge in natural language processing, but due to the sparsity of training data, previous work has shown that supervised methods for commonsense knowledge mining underperform when evaluated on novel data.
no code implementations • 23 Aug 2019 • Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M. Rush, Gu-Yeon Wei, David Brooks
The architecture is enhanced by a series of dynamic activation optimizations that enable compact storage, ensure no energy is wasted computing null operations, and maintain high MAC utilization for highly parallel accelerator designs.
1 code implementation • 19 Aug 2019 • Zachary M. Ziegler, Luke Melas-Kyriazi, Sebastian Gehrmann, Alexander M. Rush
Large pretrained language models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that adapt a pretrained model for arbitrary downstream tasks.
1 code implementation • ACL 2019 • Jiawei Zhou, Alexander M. Rush
We propose an unsupervised method for sentence summarization using only language modeling.
Ranked #35 on
Text Summarization
on GigaWord
1 code implementation • 24 Jul 2019 • Sebastian Gehrmann, Hendrik Strobelt, Robert Krüger, Hanspeter Pfister, Alexander M. Rush
Automation of tasks can have critical consequences when humans lose agency over decision processes.
1 code implementation • ACL 2019 • Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush
In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise.
1 code implementation • SEMEVAL 2019 • Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush
Popular Natural Language Inference (NLI) datasets have been shown to be tainted by hypothesis-only biases.
2 code implementations • ACL 2019 • Yoon Kim, Chris Dyer, Alexander M. Rush
We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar.
Ranked #5 on
Constituency Grammar Induction
on PTB
6 code implementations • ACL 2019 • Sebastian Gehrmann, Hendrik Strobelt, Alexander M. Rush
The rapid improvement of language models has raised the specter of abuse of text generation systems.
1 code implementation • NAACL 2019 • Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis
On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese.
Ranked #6 on
Constituency Grammar Induction
on PTB
(Max F1 (WSJ) metric)
1 code implementation • 29 Jan 2019 • Zachary M. Ziegler, Alexander M. Rush
Normalizing flows are a powerful class of generative models for continuous random variables, showing both strong model flexibility and the potential for non-autoregressive generation.
no code implementations • 17 Dec 2018 • Yoon Kim, Sam Wiseman, Alexander M. Rush
There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning.
1 code implementation • WS 2018 • Sebastian Gehrmann, Falcon Z. Dai, Henry Elder, Alexander M. Rush
Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG.
1 code implementation • EMNLP 2018 • Luong Hoang, Sam Wiseman, Alexander M. Rush
Reading comprehension tasks test the ability of models to process long-term context and remember salient information.
5 code implementations • EMNLP 2018 • Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush
We use this selector as a bottom-up attention step to constrain the model to likely phrases.
Ranked #4 on
Multi-Document Summarization
on Multi-News
2 code implementations • EMNLP 2018 • Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation.
no code implementations • 12 Jul 2018 • Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei
VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful.
1 code implementation • NeurIPS 2018 • Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush
This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.
Ranked #23 on
Machine Translation
on IWSLT2014 German-English
8 code implementations • WS 2018 • Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, Alexander M. Rush
OpenNMT is an open-source toolkit for neural machine translation (NMT).
1 code implementation • 25 Apr 2018 • Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, Alexander M. Rush
In this work, we present a visual analysis tool that allows interaction with a trained sequence-to-sequence model through each stage of the translation process.
1 code implementation • ICML 2018 • Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush
Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network.
Ranked #2 on
Text Generation
on Yahoo Questions
2 code implementations • 13 Nov 2017 • Brandon Reagen, Udit Gupta, Robert Adolf, Michael M. Mitzenmacher, Alexander M. Rush, Gu-Yeon Wei, David Brooks
This results in up to a 1. 51x improvement over the state-of-the-art.
1 code implementation • 3 Oct 2017 • Ankit Gupta, Alexander M. Rush
We consider the task of detecting regulatory elements in the human genome directly from raw DNA.
no code implementations • 12 Sep 2017 • Guillaume Klein, Yoon Kim, Yuntian Deng, Josep Crego, Jean Senellart, Alexander M. Rush
We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.
1 code implementation • EMNLP 2017 • Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber
In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches.
4 code implementations • EMNLP 2017 • Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records.
6 code implementations • 13 Jun 2017 • Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann Lecun
This adversarially regularized autoencoder (ARAE) allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce change in the output space.
no code implementations • 3 Feb 2017 • Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush
Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network.
4 code implementations • ACL 2017 • Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, Alexander M. Rush
We describe an open-source toolkit for neural machine translation (NMT).
no code implementations • 9 Nov 2016 • Greg Yang, Alexander M. Rush
The head is moved via Lie group actions, such as shifts or rotations, generated by a controller, and memory access is performed by linear smoothing in key space.
12 code implementations • ICML 2017 • Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush
We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.
5 code implementations • EMNLP 2016 • Yoon Kim, Alexander M. Rush
We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model).
Ranked #1 on
Machine Translation
on IWSLT2015 Thai-English
1 code implementation • 23 Jun 2016 • Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. Rush
In this work, we present LSTMVIS, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics.
5 code implementations • EMNLP 2016 • Sam Wiseman, Alexander M. Rush
In this work, we introduce a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores.
Ranked #13 on
Machine Translation
on IWSLT2015 German-English
1 code implementation • EMNLP 2016 • Allen Schmaltz, Alexander M. Rush, Stuart M. Shieber
Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence.
no code implementations • WS 2016 • Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber
We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016.
1 code implementation • NAACL 2016 • Sam Wiseman, Alexander M. Rush, Stuart M. Shieber
There is compelling evidence that coreference prediction would benefit from modeling global information about entity-clusters.
Ranked #15 on
Coreference Resolution
on OntoNotes
4 code implementations • EMNLP 2015 • Alexander M. Rush, Sumit Chopra, Jason Weston
Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build.
Ranked #1 on
Extractive Text Summarization
on DUC 2004 Task 1
16 code implementations • 26 Aug 2015 • Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush
We describe a simple neural language model that relies only on character-level inputs.
19 code implementations • 19 Feb 2015 • Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, Tomas Mikolov
One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent.
no code implementations • 23 Jan 2014 • Alexander M. Rush, Michael Collins
Dual decomposition, and more generally Lagrangian relaxation, is a classical method for combinatorial optimization; it has recently been applied to several inference problems in natural language processing (NLP).