no code implementations • 30 Dec 2024 • Jiawei Zhou, Woojeong Kim, Zhiying Xu, Alexander M. Rush, Minlan Yu
Our presented NetFlowGen framework goes beyond a proof-of-concept for network traffic pre-training and addresses specific challenges such as unifying network feature representations, learning from large unlabeled traffic data volume, and testing on real downstream tasks in DDoS attack detection.
1 code implementation • 21 Oct 2024 • Junjie Oscar Yin, Alexander M. Rush
Data selection can reduce the amount of training data needed to finetune LLMs; however, the efficacy of data selection scales directly with its compute.
no code implementations • 3 Oct 2024 • John X. Morris, Alexander M. Rush
In this work, we argue that these embeddings, while effective, are implicitly out-of-context for targeted use cases of retrieval, and that a contextualized document embedding should take into account both the document and neighboring documents in context - analogous to contextualized word embeddings.
1 code implementation • 18 Sep 2024 • Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush
Broad textual understanding and in-context learning require language models that utilize full document contexts.
2 code implementations • 27 Aug 2024 • Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao
The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks.
1 code implementation • 24 Jul 2024 • Wenting Zhao, Ge Gao, Claire Cardie, Alexander M. Rush
We curate CouldAsk, an evaluation benchmark composed of existing and new datasets for document-grounded question answering, specifically designed to study reformulating unanswerable questions.
no code implementations • 2 Apr 2024 • Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li
Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark.
Ranked #1 on Entity Linking on KORE50 (Micro-F1 strong metric)
no code implementations • 24 Jan 2024 • Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush
We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences.
no code implementations • CVPR 2024 • Jing Nathan Yan, Jiatao Gu, Alexander M. Rush
In recent advancements in high-fidelity image generation, Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a key player.
2 code implementations • 22 Nov 2023 • John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush
We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text.
no code implementations • 14 Nov 2023 • Jing Nathan Yan, Tianqi Liu, Justin T Chiu, Jiaming Shen, Zhen Qin, Yue Yu, Yao Zhao, Charu Lakshmanan, Yair Kurzion, Alexander M. Rush, Jialu Liu, Michael Bendersky
Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning.
2 code implementations • 1 Nov 2023 • Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
As the size of pre-trained speech recognition models increases, running these large models in low-latency or resource-constrained environments becomes challenging.
1 code implementation • 26 Oct 2023 • Justin T. Chiu, Wenting Zhao, Derek Chen, Saujas Vaduguru, Alexander M. Rush, Daniel Fried
Large language models (LLMs) excel at processing and generating both text and code.
2 code implementations • 25 Oct 2023 • Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf
Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.
Ranked #7 on Zero-Shot Learning on MedConceptsQA
2 code implementations • 21 Oct 2023 • John X. Morris, Chandan Singh, Alexander M. Rush, Jianfeng Gao, Yuntian Deng
Prompting language models (LMs) is the main interface for applying them to new tasks.
1 code implementation • 10 Oct 2023 • John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush
How much private information do text embeddings reveal about the original text?
no code implementations • 25 Sep 2023 • Celine Lee, Abdulrahman Mahmoud, Michal Kurek, Simone Campanoni, David Brooks, Stephen Chong, Gu-Yeon Wei, Alexander M. Rush
In this work, we leverage the strengths of LMs and symbolic solvers in a neurosymbolic approach to learned transpilation for assembly code.
1 code implementation • NeurIPS 2023 • Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh
Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks.
Ranked #14 on MMR total on MRR-Benchmark (using extra training data)
2 code implementations • NeurIPS 2023 • Niklas Muennighoff, Alexander M. Rush, Boaz Barak, Teven Le Scao, Aleksandra Piktus, Nouamane Tazi, Sampo Pyysalo, Thomas Wolf, Colin Raffel
We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data.
no code implementations • 24 May 2023 • Wenting Zhao, Justin T. Chiu, Claire Cardie, Alexander M. Rush
Instead of using direct supervision, this work proposes an approach for abductive commonsense reasoning that exploits the fact that only a subset of explanations is correct for a given context.
no code implementations • 23 May 2023 • Wenting Zhao, Justin T. Chiu, Claire Cardie, Alexander M. Rush
Explainable multi-hop question answering (QA) not only predicts answers but also identifies rationales, i. e. subsets of input sentences used to derive the answers.
1 code implementation • 20 Dec 2022 • Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush
Even so, BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.
7 code implementations • 9 Nov 2022 • BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.
1 code implementation • 25 Oct 2022 • Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, Minlan Yu
The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale.
2 code implementations • 24 Oct 2022 • Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
To promote the development of multi-domain speech systems, we introduce the End-to-end Speech Benchmark (ESB) for evaluating the performance of a single automatic speech recognition (ASR) system across a broad set of speech datasets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 20 Oct 2022 • John X. Morris, Justin T. Chiu, Ramin Zabih, Alexander M. Rush
We propose an unsupervised deidentification method that masks words that leak personally-identifying information.
1 code implementation • 16 Oct 2022 • Yuntian Deng, Volodymyr Kuleshov, Alexander M. Rush
Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e. g., story progression).
1 code implementation • 11 Oct 2022 • Yuntian Deng, Noriyuki Kojima, Alexander M. Rush
These experiments each verify the effectiveness of the diffusion process and the use of scheduled sampling to fix generation issues.
2 code implementations • 4 Oct 2022 • Chandan Singh, John X. Morris, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao
Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks.
1 code implementation • 30 Sep 2022 • Leandro von Werra, Lewis Tunstall, Abhishek Thakur, Alexandra Sasha Luccioni, Tristan Thrush, Aleksandra Piktus, Felix Marty, Nazneen Rajani, Victor Mustar, Helen Ngo, Omar Sanseviero, Mario Šaško, Albert Villanova, Quentin Lhoest, Julien Chaumond, Margaret Mitchell, Alexander M. Rush, Thomas Wolf, Douwe Kiela
We introduce Evaluate and Evaluation on the Hub --a set of tools to facilitate the evaluation of models and datasets in ML.
no code implementations • 16 Aug 2022 • Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush
State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training.
1 code implementation • ACL 2022 • Stephen H. Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-David, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Alan Fries, Maged S. Al-shaibani, Shanya Sharma, Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev, Mike Tian-Jian Jiang, Alexander M. Rush
PromptSource is a system for creating, sharing, and using natural language prompts.
1 code implementation • NeurIPS 2021 • Justin T. Chiu, Yuntian Deng, Alexander M. Rush
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
no code implementations • 19 Oct 2021 • Hendrik Strobelt, Jambay Kinley, Robert Krueger, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush
These controls allow users to globally constrain model generations, without sacrificing the representation power of the deep learning models.
8 code implementations • ICLR 2022 • Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020).
2 code implementations • EMNLP 2021 • Keyon Vafa, Yuntian Deng, David M. Blei, Alexander M. Rush
Compared to existing baselines, greedy rationalization is best at optimizing the combinatorial objective and provides the most faithful rationales.
1 code implementation • EMNLP 2021 • François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush
Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models.
1 code implementation • EMNLP (ACL) 2021 • Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario Šaško, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugger, Clément Delangue, Théo Matussière, Lysandre Debut, Stas Bekman, Pierric Cistac, Thibault Goehringer, Victor Mustar, François Lagunas, Alexander M. Rush, Thomas Wolf
The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks.
3 code implementations • NAACL 2021 • Steven Cao, Victor Sanh, Alexander M. Rush
The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model's internal representations.
1 code implementation • NAACL 2021 • Teven Le Scao, Alexander M. Rush
When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction.
1 code implementation • 25 Feb 2021 • David Chiang, Alexander M. Rush, Boaz Barak
We propose a notation for tensors with named axes, which relieves the author, reader, and future implementers of machine learning models from the burden of keeping track of the order of axes and the purpose of each.
2 code implementations • ACL 2021 • Demi Guo, Alexander M. Rush, Yoon Kim
This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks.
no code implementations • ICLR 2021 • Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M. Rush
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task.
1 code implementation • NeurIPS 2020 • Yao Fu, Chuanqi Tan, Bin Bi, Mosha Chen, Yansong Feng, Alexander M. Rush
Learning to control the structure of sentences is a challenging problem in text generation.
no code implementations • 28 Nov 2020 • Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul N. Whatmough, Alexander M. Rush, David Brooks, Gu-Yeon Wei
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
1 code implementation • EMNLP 2020 • Demi Guo, Yoon Kim, Alexander M. Rush
Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language.
1 code implementation • EMNLP 2020 • Justin T. Chiu, Alexander M. Rush
The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure.
1 code implementation • EMNLP 2020 • Congzheng Song, Alexander M. Rush, Vitaly Shmatikov
We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models.
1 code implementation • 24 Oct 2020 • Sam Shleifer, Alexander M. Rush
A third, simpler approach is to 'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning.
2 code implementations • EACL 2021 • Xinya Du, Alexander M. Rush, Claire Cardie
We revisit the classic problem of document-level role-filler entity extraction (REE) for template filling.
1 code implementation • NeurIPS 2020 • Yuntian Deng, Alexander M. Rush
The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.
4 code implementations • NeurIPS 2020 • Victor Sanh, Thomas Wolf, Alexander M. Rush
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications.
2 code implementations • ACL 2020 • Xiang Lisa Li, Alexander M. Rush
In this work, we consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach.
1 code implementation • ACL 2020 • Noriyuki Kojima, Hadar Averbuch-Elor, Alexander M. Rush, Yoav Artzi
Visual features are a promising signal for learning bootstrap textual models.
2 code implementations • 13 Mar 2020 • Jiawei Zhou, Zhiying Xu, Alexander M. Rush, Minlan Yu
Botnets are now a major source for many network attacks, such as DDoS attacks and spam.
1 code implementation • ACL 2020 • Alexander M. Rush
The literature on structured prediction for NLP describes a rich collection of distributions and algorithms over sequences, segmentations, alignments, and trees; however, these algorithms are difficult to utilize in deep learning frameworks.
no code implementations • 13 Nov 2019 • Michael Lingzhi Li, Meng Dong, Jiawei Zhou, Alexander M. Rush
We derive theoretical results about the discriminative power and feature representation capabilities of each class.
9 code implementations • 9 Oct 2019 • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, Alexander M. Rush
Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks.
no code implementations • 8 Oct 2019 • Georgios A. Tritsaris, Yiqi Xie, Alexander M. Rush, Stephen Carr, Marios Mattheakis, Efthimios Kaxiras
Two-dimensional (2D) layered materials offer intriguing possibilities for novel physics and applications.
Materials Science
1 code implementation • IJCNLP 2019 • Zachary M. Ziegler, Yuntian Deng, Alexander M. Rush
Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal.
1 code implementation • IJCNLP 2019 • Joshua Feldman, Joe Davison, Alexander M. Rush
Inferring commonsense knowledge is a key challenge in natural language processing, but due to the sparsity of training data, previous work has shown that supervised methods for commonsense knowledge mining underperform when evaluated on novel data.
no code implementations • 23 Aug 2019 • Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M. Rush, Gu-Yeon Wei, David Brooks
The architecture is enhanced by a series of dynamic activation optimizations that enable compact storage, ensure no energy is wasted computing null operations, and maintain high MAC utilization for highly parallel accelerator designs.
1 code implementation • 19 Aug 2019 • Zachary M. Ziegler, Luke Melas-Kyriazi, Sebastian Gehrmann, Alexander M. Rush
Large pretrained language models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that adapt a pretrained model for arbitrary downstream tasks.
1 code implementation • ACL 2019 • Jiawei Zhou, Alexander M. Rush
We propose an unsupervised method for sentence summarization using only language modeling.
Ranked #39 on Text Summarization on GigaWord
1 code implementation • 24 Jul 2019 • Sebastian Gehrmann, Hendrik Strobelt, Robert Krüger, Hanspeter Pfister, Alexander M. Rush
Automation of tasks can have critical consequences when humans lose agency over decision processes.
1 code implementation • ACL 2019 • Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush
In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise.
1 code implementation • SEMEVAL 2019 • Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush
Popular Natural Language Inference (NLI) datasets have been shown to be tainted by hypothesis-only biases.
2 code implementations • ACL 2019 • Yoon Kim, Chris Dyer, Alexander M. Rush
We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar.
6 code implementations • ACL 2019 • Sebastian Gehrmann, Hendrik Strobelt, Alexander M. Rush
The rapid improvement of language models has raised the specter of abuse of text generation systems.
1 code implementation • NAACL 2019 • Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis
On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese.
Ranked #11 on Constituency Grammar Induction on PTB Diagnostic ECG Database (Max F1 (WSJ) metric)
1 code implementation • 29 Jan 2019 • Zachary M. Ziegler, Alexander M. Rush
Normalizing flows are a powerful class of generative models for continuous random variables, showing both strong model flexibility and the potential for non-autoregressive generation.
no code implementations • 17 Dec 2018 • Yoon Kim, Sam Wiseman, Alexander M. Rush
There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning.
1 code implementation • WS 2018 • Sebastian Gehrmann, Falcon Z. Dai, Henry Elder, Alexander M. Rush
Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG.
1 code implementation • EMNLP 2018 • Luong Hoang, Sam Wiseman, Alexander M. Rush
Reading comprehension tasks test the ability of models to process long-term context and remember salient information.
5 code implementations • EMNLP 2018 • Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush
We use this selector as a bottom-up attention step to constrain the model to likely phrases.
Ranked #4 on Multi-Document Summarization on Multi-News
2 code implementations • EMNLP 2018 • Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation.
no code implementations • 12 Jul 2018 • Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei
VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful.
1 code implementation • NeurIPS 2018 • Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush
This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.
Ranked #29 on Machine Translation on IWSLT2014 German-English
9 code implementations • WS 2018 • Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, Alexander M. Rush
OpenNMT is an open-source toolkit for neural machine translation (NMT).
1 code implementation • 25 Apr 2018 • Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, Alexander M. Rush
In this work, we present a visual analysis tool that allows interaction with a trained sequence-to-sequence model through each stage of the translation process.
1 code implementation • ICML 2018 • Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush
Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network.
Ranked #2 on Text Generation on Yahoo Questions
2 code implementations • 13 Nov 2017 • Brandon Reagen, Udit Gupta, Robert Adolf, Michael M. Mitzenmacher, Alexander M. Rush, Gu-Yeon Wei, David Brooks
This results in up to a 1. 51x improvement over the state-of-the-art.
1 code implementation • 3 Oct 2017 • Ankit Gupta, Alexander M. Rush
We consider the task of detecting regulatory elements in the human genome directly from raw DNA.
no code implementations • 12 Sep 2017 • Guillaume Klein, Yoon Kim, Yuntian Deng, Josep Crego, Jean Senellart, Alexander M. Rush
We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.
1 code implementation • EMNLP 2017 • Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber
In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches.
4 code implementations • EMNLP 2017 • Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records.
6 code implementations • 13 Jun 2017 • Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann Lecun
This adversarially regularized autoencoder (ARAE) allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce change in the output space.
no code implementations • 3 Feb 2017 • Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush
Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network.
4 code implementations • ACL 2017 • Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, Alexander M. Rush
We describe an open-source toolkit for neural machine translation (NMT).
no code implementations • 9 Nov 2016 • Greg Yang, Alexander M. Rush
The head is moved via Lie group actions, such as shifts or rotations, generated by a controller, and memory access is performed by linear smoothing in key space.
14 code implementations • ICML 2017 • Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush
We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.
6 code implementations • EMNLP 2016 • Yoon Kim, Alexander M. Rush
We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model).
Ranked #1 on Machine Translation on IWSLT2015 Thai-English
1 code implementation • 23 Jun 2016 • Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. Rush
In this work, we present LSTMVIS, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics.
6 code implementations • EMNLP 2016 • Sam Wiseman, Alexander M. Rush
In this work, we introduce a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores.
Ranked #13 on Machine Translation on IWSLT2015 German-English
1 code implementation • EMNLP 2016 • Allen Schmaltz, Alexander M. Rush, Stuart M. Shieber
Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence.
no code implementations • WS 2016 • Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber
We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016.
1 code implementation • NAACL 2016 • Sam Wiseman, Alexander M. Rush, Stuart M. Shieber
There is compelling evidence that coreference prediction would benefit from modeling global information about entity-clusters.
Ranked #26 on Coreference Resolution on OntoNotes
4 code implementations • EMNLP 2015 • Alexander M. Rush, Sumit Chopra, Jason Weston
Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build.
Ranked #1 on Extractive Text Summarization on DUC 2004 Task 1
14 code implementations • 26 Aug 2015 • Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush
We describe a simple neural language model that relies only on character-level inputs.
20 code implementations • 19 Feb 2015 • Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, Tomas Mikolov
One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent.
no code implementations • 23 Jan 2014 • Alexander M. Rush, Michael Collins
Dual decomposition, and more generally Lagrangian relaxation, is a classical method for combinatorial optimization; it has recently been applied to several inference problems in natural language processing (NLP).