1 code implementation • 29 Nov 2024 • Bowen Peng, Jeffrey Quesnelle, Diederik P. Kingma
Training large neural networks typically requires sharing gradients between accelerators through specialized high-speed interconnects.
no code implementations • 15 Aug 2024 • Ryan Teknium, Jeffrey Quesnelle, Chen Guang
Instruct (or "chat") tuned models have become the primary way in which most people interact with large language models.
7 code implementations • 31 Aug 2023 • Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole
Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models.