Search Results for author: Arian Hosseini

Found 11 papers, 6 papers with code

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

1 code implementation23 Oct 2024 Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville

However, asynchronous training relies on an underexplored regime, online but off-policy RLHF: learning on samples from previous iterations of our model.

Instruction Following Language Modelling +1

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

no code implementations29 Aug 2024 Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi

Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs.

Diversity Knowledge Distillation +1

Generative Verifiers: Reward Modeling as Next-Token Prediction

no code implementations27 Aug 2024 Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal

We demonstrate that GenRM outperforms discriminative, DPO verifiers, and LLM-as-a-Judge, resulting in a 16-40% improvement in the number of problems solved with Best-of-N on algorithmic and math reasoning tasks.

Math Prediction +1

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

1 code implementation24 Mar 2024 Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.

reinforcement-learning

V-STaR: Training Verifiers for Self-Taught Reasoners

no code implementations9 Feb 2024 Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal

Common self-improvement approaches for large language models (LLMs), such as STaR, iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability.

Code Generation Math

On the Compositional Generalization Gap of In-Context Learning

no code implementations15 Nov 2022 Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville

In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.

In-Context Learning Semantic Parsing

Understanding by Understanding Not: Modeling Negation in Language Models

1 code implementation NAACL 2021 Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville

To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus.

Language Modeling Language Modelling +1

Ordered Memory

1 code implementation NeurIPS 2019 Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville

Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.

ListOps

Learning to Understand Goal Specifications by Modelling Reward

1 code implementation ICLR 2019 Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.

Deep Reinforcement Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.