no code implementations • 6 Oct 2023 • Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli
Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models.
no code implementations • 28 Aug 2023 • Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie
Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world.
no code implementations • 17 Jun 2023 • Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Sumit Sanghai, William W. Cohen, Joshua Ainslie
Memory-augmentation is a powerful approach for efficiently incorporating external information into language models, but leads to reduced performance relative to retrieving text.
3 code implementations • 22 May 2023 • Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai
Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference.
no code implementations • 17 Mar 2023 • Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontañón, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai
Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token.
Ranked #1 on Long-range modeling on SCROLLS
no code implementations • 25 Jan 2023 • Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, William Cohen
Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks.
no code implementations • 21 Dec 2022 • Luke Vilnis, Zach Fisher, Bhargav Kanagal, Patrick Murray, Sumit Sanghai
Large language models have ushered in a golden age of semantic parsing.
no code implementations • 15 Dec 2022 • Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, William Cohen
Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks.
Ranked #3 on Question Answering on WebQuestions
1 code implementation • 18 Oct 2022 • Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Passos, Sumit Sanghai
Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation.
no code implementations • COLING 2022 • Yury Zemlyanskiy, Michiel de Jong, Joshua Ainslie, Panupong Pasupat, Peter Shaw, Linlu Qiu, Sumit Sanghai, Fei Sha
Then, it retrieves exemplars with outputs similar to the preliminary prediction which are used to generate a final prediction.
1 code implementation • 16 Dec 2021 • Li Yang, Qifan Wang, Zac Yu, Anand Kulkarni, Sumit Sanghai, Bin Shu, Jon Elsas, Bhargav Kanagal
Attribute value extraction refers to the task of identifying values of an attribute of interest from product information.
Ranked #1 on Attribute Value Extraction on MAVE
no code implementations • 2 Sep 2021 • Gurmeet Manku, James Lee-Thorp, Bhargav Kanagal, Joshua Ainslie, Jingchen Feng, Zach Pearson, Ebenezer Anjorin, Sudeep Gandhe, Ilya Eckstein, Jim Rosswog, Sumit Sanghai, Michael Pohl, Larry Adams, D. Sivakumar
The dialog understanding system consists of a deep-learned Contextual Language Understanding module, which interprets user utterances, and a primarily rules-based Dialog-State Tracker (DST), which updates the dialog state and formulates search requests intended for the fulfillment engine.
no code implementations • ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020 • Qifan Wang, Li Yang, Bhargav Kanagal, Sumit Sanghai, D. Sivakumar, Bin Shu, Zac Yu, Jon Elsas
In particular, we build a question answering model which treats each attribute as a question and identifies the answer span corresponding to the attribute value in the product context.
2 code implementations • EMNLP 2020 • Joshua Ainslie, Santiago Ontanon, Chris Alberti, Vaclav Cvicek, Zachary Fisher, Philip Pham, Anirudh Ravula, Sumit Sanghai, Qifan Wang, Li Yang
Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks.
Ranked #3 on Question Answering on ConditionalQA