Search Results for author: Arian Hosseini

Found 7 papers, 5 papers with code

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

1 code implementation24 Mar 2024 Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.

reinforcement-learning

V-STaR: Training Verifiers for Self-Taught Reasoners

no code implementations9 Feb 2024 Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal

Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability.

Code Generation Math

On the Compositional Generalization Gap of In-Context Learning

no code implementations15 Nov 2022 Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville

In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.

In-Context Learning Semantic Parsing

Understanding by Understanding Not: Modeling Negation in Language Models

1 code implementation NAACL 2021 Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville

To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus.

Language Modelling Negation

Ordered Memory

1 code implementation NeurIPS 2019 Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville

Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.

ListOps

Learning to Understand Goal Specifications by Modelling Reward

1 code implementation ICLR 2019 Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.

Cannot find the paper you are looking for? You can Submit a new open access paper.