Search Results for author: Nathan Sarrazin

Found 1 papers, 1 papers with code

Zephyr: Direct Distillation of LM Alignment

1 code implementation25 Oct 2023 Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf

Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.

2D Cyclist Detection Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.