Search Results for author: Maksym Zhuravinskyi

Found 4 papers, 0 papers with code

Stable Code Technical Report

no code implementations • 1 Apr 2024 • Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, Ashish Datta, Maksym Zhuravinskyi, Dakota Mahan, Marco Bellagente, Carlos Riquelme, Nathan Cooper

Stable Code Instruct also exhibits state-of-the-art performance on the MT-Bench coding tasks and on Multi-PL completion compared to other instruction tuned models.

Code Completion Language Modelling +2

Paper
Add Code

Teaching Large Language Models to Reason with Reinforcement Learning

no code implementations • 7 Mar 2024 • Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Surprisingly, we find the sample complexity of Expert Iteration is similar to that of PPO, requiring at most on the order of $10^6$ samples to converge from a pretrained checkpoint.

reinforcement-learning

Paper
Add Code

Stable LM 2 1.6B Technical Report

no code implementations • 27 Feb 2024 • Marco Bellagente, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyi, Reshinth Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme

We introduce StableLM 2 1. 6B, the first in a new generation of our language model series.

Language Modelling

Paper
Add Code

GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

no code implementations • 13 Feb 2024 • Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Railneau

Outcome-based Reward Models (\textbf{ORMs}), trained to predict correctness of the final answer indicating when to refine, offer one convenient solution for deciding when to refine.

GSM8K Math

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.