1 code implementation • EMNLP (NLPOSS) 2020 • Brian Lester
After a model assigns labels to each token, these prefixes are used to group the tokens into spans.
no code implementations • 10 Aug 2022 • Brian Lester, Joshua Yurtsever, Siamak Shakeri, Noah Constant
Parameter-efficient methods are able to use a single frozen pre-trained large language model (LLM) to perform many tasks by learning task-specific soft prompts that modulate model behavior when concatenated to the input text.
1 code implementation • 25 May 2022 • Tu Vu, Aditya Barua, Brian Lester, Daniel Cer, Mohit Iyyer, Noah Constant
In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study.
3 code implementations • 31 Mar 2022 • Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen, Kathleen Kenealy, Jonathan H. Clark, Stephan Lee, Dan Garrette, James Lee-Thorp, Colin Raffel, Noam Shazeer, Marvin Ritter, Maarten Bosma, Alexandre Passos, Jeremy Maitin-Shepard, Noah Fiedel, Mark Omernick, Brennan Saeta, Ryan Sepassi, Alexander Spiridonov, Joshua Newlan, Andrea Gesmundo
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves.
no code implementations • ACL 2022 • Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou, Daniel Cer
Finally, we propose an efficient retrieval approach that interprets task prompts as task embeddings to identify similar tasks and predict the most transferable source tasks for a novel target task.
3 code implementations • ICLR 2022 • Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le
We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks.
Ranked #1 on
Question Answering
on StoryCloze
1 code implementation • NAACL 2021 • Brian Lester, Sagnik Ray Choudhury, Rashmi Prasad, Srinivas Bangalore
Complex natural language understanding modules in dialog systems have a richer understanding of user utterances, and thus are critical in providing a better user experience.
7 code implementations • EMNLP 2021 • Brian Lester, Rami Al-Rfou, Noah Constant
More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Brian Lester, Daniel Pressel, Amy Hemmeter, Sagnik Ray Choudhury, Srinivas Bangalore
Current state-of-the-art models for named entity recognition (NER) are neural models with a conditional random field (CRF) as the final layer.
1 code implementation • 9 Oct 2020 • Brian Lester
After a model assigns labels to each token, these prefixes are used to group the tokens into spans.
1 code implementation • 30 Sep 2020 • Brian Lester, Daniel Pressel, Amy Hemmeter, Sagnik Ray Choudhury, Srinivas Bangalore
Most state-of-the-art models in natural language processing (NLP) are neural models built on top of large, pre-trained, contextual language models that generate representations of words in context and are fine-tuned for the task at hand.
1 code implementation • 29 Sep 2020 • Brian Lester
Two competing file formats have become the de facto standards for distributing pre-trained word embeddings.
1 code implementation • 5 Jan 2020 • Brian Lester, Daniel Pressel, Amy Hemmeter, Sagnik Ray Choudhury
The CRF layer is used to facilitate global coherence between labels, and the contextual embeddings provide a better representation of words in context.
no code implementations • NAACL 2019 • Ishan Jindal, Daniel Pressel, Brian Lester, Matthew Nokleby
In this paper, we propose an approach to training deep networks that is robust to label noise.
1 code implementation • WS 2018 • Daniel Pressel, Sagnik Ray Choudhury, Brian Lester, Yanjie Zhao, Matt Barta
We introduce Baseline: a library for reproducible deep learning research and fast model development for NLP.