Popularized by the Differentiable Search Index, the emerging paradigm of generative retrieval re-frames the classic information retrieval problem into a sequence-to-sequence modeling task, forgoing external indices and encoding an entire document corpus within a single Transformer.
In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents.
With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic.
Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks.
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way.
As such, it is crucial to develop models that are effective across a diverse range of languages, usages, and styles.
1 code implementation • 14 Feb 2022 • Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler
In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.
3 code implementations • • Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler
Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training.
In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.
Ranked #3 on Paraphrase Identification on Quora Question Pairs
In the context of language models, are convolutional models competitive to Transformers when pre-trained?
In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.
Ranked #1 on Machine Translation on WMT2017 Russian-English
We propose novel modifications to the algorithms and new imaging architectures which, significantly reduces the computation time.