Search Results for author: Alexander M. Rush

Found 85 papers, 68 papers with code

OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

2 code implementations21 Jun 2023 Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks.

Language Modelling

Scaling Data-Constrained Language Models

1 code implementation25 May 2023 Niklas Muennighoff, Alexander M. Rush, Boaz Barak, Teven Le Scao, Aleksandra Piktus, Nouamane Tazi, Sampo Pyysalo, Thomas Wolf, Colin Raffel

We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data.

Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations

no code implementations24 May 2023 Wenting Zhao, Justin T. Chiu, Claire Cardie, Alexander M. Rush

Instead of using direct supervision, this work proposes an approach for abductive commonsense reasoning that exploits the fact that only a subset of explanations is correct for a given context.

HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale Supervision

no code implementations23 May 2023 Wenting Zhao, Justin T. Chiu, Claire Cardie, Alexander M. Rush

Explainable multi-hop question answering (QA) not only predicts answers but also identifies rationales, i. e. subsets of input sentences used to derive the answers.

Multi-hop Question Answering Question Answering

Pretraining Without Attention

1 code implementation20 Dec 2022 Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush

Even so, BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

3 code implementations9 Nov 2022 BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Language Modelling Multilingual NLP

Teal: Learning-Accelerated Optimization of WAN Traffic Engineering

1 code implementation25 Oct 2022 Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, Minlan Yu

The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition

no code implementations24 Oct 2022 Sanchit Gandhi, Patrick von Platen, Alexander M. Rush

To promote the development of multi-domain speech systems, we introduce the End-to-end Speech Benchmark (ESB) for evaluating the performance of a single automatic speech recognition (ASR) system across a broad set of speech datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Unsupervised Text Deidentification

1 code implementation20 Oct 2022 John X. Morris, Justin T. Chiu, Ramin Zabih, Alexander M. Rush

We propose an unsupervised deidentification method that masks words that leak personally-identifying information.

Named Entity Recognition Named Entity Recognition (NER)

Model Criticism for Long-Form Text Generation

1 code implementation16 Oct 2022 Yuntian Deng, Volodymyr Kuleshov, Alexander M. Rush

Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e. g., story progression).

Text Generation

Markup-to-Image Diffusion Models with Scheduled Sampling

1 code implementation11 Oct 2022 Yuntian Deng, Noriyuki Kojima, Alexander M. Rush

These experiments each verify the effectiveness of the diffusion process and the use of scheduled sampling to fix generation issues.

Denoising Image Generation +1

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models

no code implementations16 Aug 2022 Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush

State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training.

Prompt Engineering

Low-Rank Constraints for Fast Inference in Structured Models

1 code implementation NeurIPS 2021 Justin T. Chiu, Yuntian Deng, Alexander M. Rush

This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.

Language Modelling Music Modeling

GenNI: Human-AI Collaboration for Data-Backed Text Generation

no code implementations19 Oct 2021 Hendrik Strobelt, Jambay Kinley, Robert Krueger, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush

These controls allow users to globally constrain model generations, without sacrificing the representation power of the deep learning models.

Descriptive Text Generation

Rationales for Sequential Predictions

2 code implementations EMNLP 2021 Keyon Vafa, Yuntian Deng, David M. Blei, Alexander M. Rush

Compared to existing baselines, greedy rationalization is best at optimizing the combinatorial objective and provides the most faithful rationales.

Combinatorial Optimization Language Modelling +2

Block Pruning For Faster Transformers

1 code implementation EMNLP 2021 François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush

Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models.

Machine Translation Question Answering

Low-Complexity Probing via Finding Subnetworks

1 code implementation NAACL 2021 Steven Cao, Victor Sanh, Alexander M. Rush

The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model's internal representations.

How Many Data Points is a Prompt Worth?

1 code implementation NAACL 2021 Teven Le Scao, Alexander M. Rush

When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction.

Classification General Classification

Named Tensor Notation

1 code implementation25 Feb 2021 David Chiang, Alexander M. Rush, Boaz Barak

We propose a notation for tensors with named axes, which relieves the author, reader, and future implementers of machine learning models from the burden of keeping track of the order of axes and the purpose of each.

Parameter-Efficient Transfer Learning with Diff Pruning

2 code implementations ACL 2021 Demi Guo, Alexander M. Rush, Yoon Kim

This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks.

Transfer Learning

Learning from others' mistakes: Avoiding dataset biases without modeling them

no code implementations ICLR 2021 Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M. Rush

State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task.

Sequence-Level Mixed Sample Data Augmentation

1 code implementation EMNLP 2020 Demi Guo, Yoon Kim, Alexander M. Rush

Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language.

Data Augmentation Semantic Parsing +1

Scaling Hidden Markov Language Models

1 code implementation EMNLP 2020 Justin T. Chiu, Alexander M. Rush

The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure.

Language Modelling

Pre-trained Summarization Distillation

1 code implementation24 Oct 2020 Sam Shleifer, Alexander M. Rush

A third, simpler approach is to 'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning.

Knowledge Distillation Machine Translation +1

Cascaded Text Generation with Markov Transformers

1 code implementation NeurIPS 2020 Yuntian Deng, Alexander M. Rush

The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.

Machine Translation Text Generation +1

Movement Pruning: Adaptive Sparsity by Fine-Tuning

4 code implementations NeurIPS 2020 Victor Sanh, Thomas Wolf, Alexander M. Rush

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications.

Network Pruning Transfer Learning

Posterior Control of Blackbox Generation

2 code implementations ACL 2020 Xiang Lisa Li, Alexander M. Rush

In this work, we consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach.

Text Generation

Automating Botnet Detection with Graph Neural Networks

1 code implementation13 Mar 2020 Jiawei Zhou, Zhiying Xu, Alexander M. Rush, Minlan Yu

Botnets are now a major source for many network attacks, such as DDoS attacks and spam.

Graph Learning

Torch-Struct: Deep Structured Prediction Library

1 code implementation ACL 2020 Alexander M. Rush

The literature on structured prediction for NLP describes a rich collection of distributions and algorithms over sequences, segmentations, alignments, and trees; however, these algorithms are difficult to utilize in deep learning frameworks.

Structured Prediction

LAN -- A materials notation for 2D layered assemblies

no code implementations8 Oct 2019 Georgios A. Tritsaris, Yiqi Xie, Alexander M. Rush, Stephen Carr, Marios Mattheakis, Efthimios Kaxiras

Two-dimensional (2D) layered materials offer intriguing possibilities for novel physics and applications.

Materials Science

Neural Linguistic Steganography

1 code implementation IJCNLP 2019 Zachary M. Ziegler, Yuntian Deng, Alexander M. Rush

Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal.

Language Modelling

Commonsense Knowledge Mining from Pretrained Models

1 code implementation IJCNLP 2019 Joshua Feldman, Joe Davison, Alexander M. Rush

Inferring commonsense knowledge is a key challenge in natural language processing, but due to the sparsity of training data, previous work has shown that supervised methods for commonsense knowledge mining underperform when evaluated on novel data.

Language Modelling

MASR: A Modular Accelerator for Sparse RNNs

no code implementations23 Aug 2019 Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M. Rush, Gu-Yeon Wei, David Brooks

The architecture is enhanced by a series of dynamic activation optimizations that enable compact storage, ensure no energy is wasted computing null operations, and maintain high MAC utilization for highly parallel accelerator designs.

speech-recognition Speech Recognition

Encoder-Agnostic Adaptation for Conditional Language Generation

1 code implementation19 Aug 2019 Zachary M. Ziegler, Luke Melas-Kyriazi, Sebastian Gehrmann, Alexander M. Rush

Large pretrained language models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that adapt a pretrained model for arbitrary downstream tasks.

Conditional Text Generation Language Modelling +2

Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

1 code implementation ACL 2019 Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush

In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise.

Natural Language Inference

Latent Normalizing Flows for Discrete Sequences

1 code implementation29 Jan 2019 Zachary M. Ziegler, Alexander M. Rush

Normalizing flows are a powerful class of generative models for continuous random variables, showing both strong model flexibility and the potential for non-autoregressive generation.

Language Modelling Music Generation

A Tutorial on Deep Latent Variable Models of Natural Language

no code implementations17 Dec 2018 Yoon Kim, Sam Wiseman, Alexander M. Rush

There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning.

Variational Inference

End-to-End Content and Plan Selection for Data-to-Text Generation

1 code implementation WS 2018 Sebastian Gehrmann, Falcon Z. Dai, Henry Elder, Alexander M. Rush

Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG.

Data-to-Text Generation

Entity Tracking Improves Cloze-style Reading Comprehension

1 code implementation EMNLP 2018 Luong Hoang, Sam Wiseman, Alexander M. Rush

Reading comprehension tasks test the ability of models to process long-term context and remember salient information.

LAMBADA

Learning Neural Templates for Text Generation

2 code implementations EMNLP 2018 Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation.

Text Generation

Avoiding Latent Variable Collapse With Generative Skip Models

no code implementations12 Jul 2018 Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei

VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful.

Latent Alignment and Variational Attention

1 code implementation NeurIPS 2018 Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush

This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.

Hard Attention Machine Translation +4

Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models

1 code implementation25 Apr 2018 Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, Alexander M. Rush

In this work, we present a visual analysis tool that allows interaction with a trained sequence-to-sequence model through each stage of the translation process.

Translation

Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

1 code implementation3 Oct 2017 Ankit Gupta, Alexander M. Rush

We consider the task of detecting regulatory elements in the human genome directly from raw DNA.

OpenNMT: Open-source Toolkit for Neural Machine Translation

no code implementations12 Sep 2017 Guillaume Klein, Yoon Kim, Yuntian Deng, Josep Crego, Jean Senellart, Alexander M. Rush

We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.

Machine Translation NMT +1

Adapting Sequence Models for Sentence Correction

1 code implementation EMNLP 2017 Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber

In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches.

Machine Translation Translation

Challenges in Data-to-Document Generation

4 code implementations EMNLP 2017 Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records.

Data-to-Text Generation Descriptive

Adversarially Regularized Autoencoders

6 code implementations13 Jun 2017 Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann Lecun

This adversarially regularized autoencoder (ARAE) allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce change in the output space.

Representation Learning Style Transfer

Structured Attention Networks

no code implementations3 Feb 2017 Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush

Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network.

Machine Translation Natural Language Inference +2

Lie-Access Neural Turing Machines

no code implementations9 Nov 2016 Greg Yang, Alexander M. Rush

The head is moved via Lie group actions, such as shifts or rotations, generated by a controller, and memory access is performed by linear smoothing in key space.

Image-to-Markup Generation with Coarse-to-Fine Attention

14 code implementations ICML 2017 Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush

We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.

Optical Character Recognition (OCR)

Sequence-Level Knowledge Distillation

6 code implementations EMNLP 2016 Yoon Kim, Alexander M. Rush

We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model).

Knowledge Distillation Machine Translation +2

LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

1 code implementation23 Jun 2016 Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. Rush

In this work, we present LSTMVIS, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics.

Sequence-to-Sequence Learning as Beam-Search Optimization

6 code implementations EMNLP 2016 Sam Wiseman, Alexander M. Rush

In this work, we introduce a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores.

Language Modelling Machine Translation +2

Word Ordering Without Syntax

1 code implementation EMNLP 2016 Allen Schmaltz, Alexander M. Rush, Stuart M. Shieber

Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence.

Language Modelling

Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction

no code implementations WS 2016 Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber

We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016.

Learning Global Features for Coreference Resolution

1 code implementation NAACL 2016 Sam Wiseman, Alexander M. Rush, Stuart M. Shieber

There is compelling evidence that coreference prediction would benefit from modeling global information about entity-clusters.

coreference-resolution

Character-Aware Neural Language Models

14 code implementations26 Aug 2015 Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush

We describe a simple neural language model that relies only on character-level inputs.

Language Modelling

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

19 code implementations19 Feb 2015 Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, Tomas Mikolov

One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent.

Question Answering Reading Comprehension

A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing

no code implementations23 Jan 2014 Alexander M. Rush, Michael Collins

Dual decomposition, and more generally Lagrangian relaxation, is a classical method for combinatorial optimization; it has recently been applied to several inference problems in natural language processing (NLP).

Combinatorial Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.