Search Results for author: Alexander M. Rush

Found 71 papers, 58 papers with code

Low-Rank Constraints for Fast Inference in Structured Models

1 code implementation NeurIPS 2021 Justin T. Chiu, Yuntian Deng, Alexander M. Rush

This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.

Language Modelling Music Modeling

GenNI: Human-AI Collaboration for Data-Backed Text Generation

no code implementations19 Oct 2021 Hendrik Strobelt, Jambay Kinley, Robert Krueger, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush

These controls allow users to globally constrain model generations, without sacrificing the representation power of the deep learning models.

Text Generation

Rationales for Sequential Predictions

2 code implementations EMNLP 2021 Keyon Vafa, Yuntian Deng, David M. Blei, Alexander M. Rush

Compared to existing baselines, greedy rationalization is best at optimizing the combinatorial objective and provides the most faithful rationales.

Combinatorial Optimization Language Modelling +2

Block Pruning For Faster Transformers

1 code implementation EMNLP 2021 François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush

Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models.

Machine Translation Question Answering

Low-Complexity Probing via Finding Subnetworks

1 code implementation NAACL 2021 Steven Cao, Victor Sanh, Alexander M. Rush

The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model's internal representations.

How Many Data Points is a Prompt Worth?

1 code implementation NAACL 2021 Teven Le Scao, Alexander M. Rush

When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction.

Classification General Classification

Named Tensor Notation

1 code implementation25 Feb 2021 David Chiang, Alexander M. Rush, Boaz Barak

We propose a notation for tensors with named axes, which relieves the author, reader, and future implementers from the burden of keeping track of the order of axes and the purpose of each.

Parameter-Efficient Transfer Learning with Diff Pruning

2 code implementations ACL 2021 Demi Guo, Alexander M. Rush, Yoon Kim

This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks.

Transfer Learning

Learning from others' mistakes: Avoiding dataset biases without modeling them

no code implementations ICLR 2021 Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M. Rush

State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task.

Sequence-Level Mixed Sample Data Augmentation

1 code implementation EMNLP 2020 Demi Guo, Yoon Kim, Alexander M. Rush

Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language.

Data Augmentation Semantic Parsing +1

Scaling Hidden Markov Language Models

1 code implementation EMNLP 2020 Justin T. Chiu, Alexander M. Rush

The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure.

Language Modelling

Pre-trained Summarization Distillation

1 code implementation24 Oct 2020 Sam Shleifer, Alexander M. Rush

A third, simpler approach is to 'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning.

Knowledge Distillation Machine Translation +1

Cascaded Text Generation with Markov Transformers

1 code implementation NeurIPS 2020 Yuntian Deng, Alexander M. Rush

The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.

Machine Translation Text Generation +1

Movement Pruning: Adaptive Sparsity by Fine-Tuning

3 code implementations NeurIPS 2020 Victor Sanh, Thomas Wolf, Alexander M. Rush

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications.

Network Pruning Pretrained Language Models +1

Posterior Control of Blackbox Generation

2 code implementations ACL 2020 Xiang Lisa Li, Alexander M. Rush

In this work, we consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach.

Text Generation

Automating Botnet Detection with Graph Neural Networks

1 code implementation13 Mar 2020 Jiawei Zhou, Zhiying Xu, Alexander M. Rush, Minlan Yu

Botnets are now a major source for many network attacks, such as DDoS attacks and spam.

Graph Learning

Torch-Struct: Deep Structured Prediction Library

1 code implementation ACL 2020 Alexander M. Rush

The literature on structured prediction for NLP describes a rich collection of distributions and algorithms over sequences, segmentations, alignments, and trees; however, these algorithms are difficult to utilize in deep learning frameworks.

Structured Prediction

LAN -- A materials notation for 2D layered assemblies

no code implementations8 Oct 2019 Georgios A. Tritsaris, Yiqi Xie, Alexander M. Rush, Stephen Carr, Marios Mattheakis, Efthimios Kaxiras

Two-dimensional (2D) layered materials offer intriguing possibilities for novel physics and applications.

Materials Science

Neural Linguistic Steganography

1 code implementation IJCNLP 2019 Zachary M. Ziegler, Yuntian Deng, Alexander M. Rush

Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal.

Language Modelling

Commonsense Knowledge Mining from Pretrained Models

1 code implementation IJCNLP 2019 Joshua Feldman, Joe Davison, Alexander M. Rush

Inferring commonsense knowledge is a key challenge in natural language processing, but due to the sparsity of training data, previous work has shown that supervised methods for commonsense knowledge mining underperform when evaluated on novel data.

Language Modelling

MASR: A Modular Accelerator for Sparse RNNs

no code implementations23 Aug 2019 Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M. Rush, Gu-Yeon Wei, David Brooks

The architecture is enhanced by a series of dynamic activation optimizations that enable compact storage, ensure no energy is wasted computing null operations, and maintain high MAC utilization for highly parallel accelerator designs.

Speech Recognition

Encoder-Agnostic Adaptation for Conditional Language Generation

1 code implementation19 Aug 2019 Zachary M. Ziegler, Luke Melas-Kyriazi, Sebastian Gehrmann, Alexander M. Rush

Large pretrained language models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that adapt a pretrained model for arbitrary downstream tasks.

Conditional Text Generation Language Modelling +2

Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

1 code implementation ACL 2019 Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush

In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise.

Natural Language Inference

Compound Probabilistic Context-Free Grammars for Grammar Induction

2 code implementations ACL 2019 Yoon Kim, Chris Dyer, Alexander M. Rush

We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar.

Constituency Grammar Induction Variational Inference

Unsupervised Recurrent Neural Network Grammars

1 code implementation NAACL 2019 Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis

On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese.

Ranked #6 on Constituency Grammar Induction on PTB (Max F1 (WSJ) metric)

Constituency Grammar Induction Language Modelling +1

Latent Normalizing Flows for Discrete Sequences

1 code implementation29 Jan 2019 Zachary M. Ziegler, Alexander M. Rush

Normalizing flows are a powerful class of generative models for continuous random variables, showing both strong model flexibility and the potential for non-autoregressive generation.

Language Modelling Music Generation

A Tutorial on Deep Latent Variable Models of Natural Language

no code implementations17 Dec 2018 Yoon Kim, Sam Wiseman, Alexander M. Rush

There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning.

Variational Inference

End-to-End Content and Plan Selection for Data-to-Text Generation

1 code implementation WS 2018 Sebastian Gehrmann, Falcon Z. Dai, Henry Elder, Alexander M. Rush

Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG.

Data-to-Text Generation

Entity Tracking Improves Cloze-style Reading Comprehension

1 code implementation EMNLP 2018 Luong Hoang, Sam Wiseman, Alexander M. Rush

Reading comprehension tasks test the ability of models to process long-term context and remember salient information.

Reading Comprehension

Learning Neural Templates for Text Generation

2 code implementations EMNLP 2018 Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation.

Text Generation

Avoiding Latent Variable Collapse With Generative Skip Models

no code implementations12 Jul 2018 Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei

VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful.

Latent Alignment and Variational Attention

1 code implementation NeurIPS 2018 Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush

This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.

Hard Attention Machine Translation +4

Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models

1 code implementation25 Apr 2018 Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, Alexander M. Rush

In this work, we present a visual analysis tool that allows interaction with a trained sequence-to-sequence model through each stage of the translation process.

Translation

Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

1 code implementation3 Oct 2017 Ankit Gupta, Alexander M. Rush

We consider the task of detecting regulatory elements in the human genome directly from raw DNA.

OpenNMT: Open-source Toolkit for Neural Machine Translation

no code implementations12 Sep 2017 Guillaume Klein, Yoon Kim, Yuntian Deng, Josep Crego, Jean Senellart, Alexander M. Rush

We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.

Machine Translation Translation

Adapting Sequence Models for Sentence Correction

1 code implementation EMNLP 2017 Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber

In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches.

Machine Translation Translation

Challenges in Data-to-Document Generation

4 code implementations EMNLP 2017 Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records.

Data-to-Text Generation

Adversarially Regularized Autoencoders

6 code implementations13 Jun 2017 Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann Lecun

This adversarially regularized autoencoder (ARAE) allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce change in the output space.

Representation Learning Style Transfer

Structured Attention Networks

no code implementations3 Feb 2017 Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush

Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network.

Machine Translation Natural Language Inference +2

Lie-Access Neural Turing Machines

no code implementations9 Nov 2016 Greg Yang, Alexander M. Rush

The head is moved via Lie group actions, such as shifts or rotations, generated by a controller, and memory access is performed by linear smoothing in key space.

Image-to-Markup Generation with Coarse-to-Fine Attention

12 code implementations ICML 2017 Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush

We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.

Optical Character Recognition

Sequence-Level Knowledge Distillation

5 code implementations EMNLP 2016 Yoon Kim, Alexander M. Rush

We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model).

Knowledge Distillation Machine Translation +1

LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

1 code implementation23 Jun 2016 Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. Rush

In this work, we present LSTMVIS, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics.

Sequence-to-Sequence Learning as Beam-Search Optimization

5 code implementations EMNLP 2016 Sam Wiseman, Alexander M. Rush

In this work, we introduce a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores.

Language Modelling Machine Translation +2

Word Ordering Without Syntax

1 code implementation EMNLP 2016 Allen Schmaltz, Alexander M. Rush, Stuart M. Shieber

Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence.

Language Modelling

Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction

no code implementations WS 2016 Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber

We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016.

Learning Global Features for Coreference Resolution

1 code implementation NAACL 2016 Sam Wiseman, Alexander M. Rush, Stuart M. Shieber

There is compelling evidence that coreference prediction would benefit from modeling global information about entity-clusters.

Coreference Resolution

Character-Aware Neural Language Models

16 code implementations26 Aug 2015 Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush

We describe a simple neural language model that relies only on character-level inputs.

Language Modelling

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

19 code implementations19 Feb 2015 Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, Tomas Mikolov

One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent.

Question Answering Reading Comprehension

A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing

no code implementations23 Jan 2014 Alexander M. Rush, Michael Collins

Dual decomposition, and more generally Lagrangian relaxation, is a classical method for combinatorial optimization; it has recently been applied to several inference problems in natural language processing (NLP).

Combinatorial Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.