1 code implementation • 13 Mar 2024 • Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt
We fit scaling laws that extrapolate in both the number of model parameters and the ratio of training tokens to parameters.
1 code implementation • 19 Feb 2024 • Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar
RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher model and then further fine-tunes the model with reinforcement learning (RL), using feedback from a critic model.
no code implementations • 14 Nov 2023 • Sedrick Keh, Justin T. Chiu, Daniel Fried
When a model is trying to gather information in an interactive setting, it benefits from asking informative questions.