Search Results for author: Jeffrey Quesnelle

Found 3 papers, 2 papers with code

DeMo: Decoupled Momentum Optimization

1 code implementation29 Nov 2024 Bowen Peng, Jeffrey Quesnelle, Diederik P. Kingma

Training large neural networks typically requires sharing gradients between accelerators through specialized high-speed interconnects.

10-shot image generation 1 Image, 2*2 Stitchi

Hermes 3 Technical Report

no code implementations15 Aug 2024 Ryan Teknium, Jeffrey Quesnelle, Chen Guang

Instruct (or "chat") tuned models have become the primary way in which most people interact with large language models.

YaRN: Efficient Context Window Extension of Large Language Models

7 code implementations31 Aug 2023 Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models.

Position

Cannot find the paper you are looking for? You can Submit a new open access paper.