Search Results for author: Arian Hosseini

Found 7 papers, 5 papers with code

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

1 code implementation • 24 Mar 2024 • Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.

reinforcement-learning

Paper
Code

V-STaR: Training Verifiers for Self-Taught Reasoners

no code implementations • 9 Feb 2024 • Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal

Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability.

Code Generation Math

Paper
Add Code

Joint Prompt Optimization of Stacked LLMs using Variational Inference

1 code implementation • NeurIPS 2023 • Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner, Nicolas Le Roux

Thus, they can be seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer.

Natural Language Understanding Variational Inference

Paper
Code

On the Compositional Generalization Gap of In-Context Learning

no code implementations • 15 Nov 2022 • Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville

In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.

In-Context Learning Semantic Parsing

Paper
Add Code

Understanding by Understanding Not: Modeling Negation in Language Models

1 code implementation • NAACL 2021 • Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville

To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus.

Language Modelling Negation

Paper
Code

Ordered Memory

1 code implementation • NeurIPS 2019 • Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville

Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.

ListOps

Paper
Code

Learning to Understand Goal Specifications by Modelling Reward

1 code implementation • ICLR 2019 • Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.