After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks.
Perhaps surprisingly, we find that LLM with 4K context window using simple retrieval-augmentation at generation can achieve comparable performance to finetuned LLM with 16K context window via positional interpolation on long context tasks, while taking much less computation.
Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval?
In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation.
In this paper, we present SONNC, a compiler for NNs that utilizes static analysis to generate optimized parallel code.