Search Results for author: Shengyi Huang

Found 12 papers, 11 papers with code

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

1 code implementation • 24 Mar 2024 • Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.

reinforcement-learning

Paper
Code

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

1 code implementation • 5 Feb 2024 • Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut, Chang Ye, Zichen Liu, Lucas N. Alegre, Alexander Nikulin, Xiao Hu, Tianlin Liu, Jongwook Choi, Brent Yi

As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone.

reinforcement-learning Reinforcement Learning (RL)

162

Paper
Code

Zephyr: Direct Distillation of LM Alignment

1 code implementation • 25 Oct 2023 • Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf

Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.

2D Cyclist Detection Language Modelling

3,700

Paper
Code

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

1 code implementation • 29 Sep 2023 • Shengyi Huang, Jiayi Weng, Rujikorn Charakorn, Min Lin, Zhongwen Xu, Santiago Ontañón

Distributed Deep Reinforcement Learning (DRL) aims to leverage more computational resources to train autonomous agents with less training time.

reinforcement-learning

Paper
Code

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

3 code implementations • 21 Jun 2022 • Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, Shuicheng Yan

EnvPool is open-sourced at https://github. com/sail-sg/envpool.

reinforcement-learning Reinforcement Learning (RL)

4,377

Paper
Code

A2C is a special case of PPO

1 code implementation • 18 May 2022 • Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago Ontañón, Rousslan Fernand Julien Dossa

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

2 code implementations • 16 Nov 2021 • Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga

CleanRL is an open-source library that provides high-quality single-file implementations of Deep Reinforcement Learning algorithms.

Benchmarking reinforcement-learning +2

4,377

Paper
Code

Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning

4 code implementations • 21 May 2021 • Shengyi Huang, Santiago Ontañón, Chris Bamford, Lukasz Grela

In recent years, researchers have achieved great success in applying Deep Reinforcement Learning (DRL) algorithms to Real-time Strategy (RTS) games, creating strong autonomous agents that could defeat professional players in StarCraft~II.

reinforcement-learning Reinforcement Learning (RL) +2

209

Paper
Code

Griddly: A platform for AI research in games

no code implementations • 12 Nov 2020 • Chris Bamford, Shengyi Huang, Simon Lucas

In recent years, there have been immense breakthroughs in Game AI research, particularly with Reinforcement Learning (RL).

Reinforcement Learning (RL)

Paper
Add Code

Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games

2 code implementations • 5 Oct 2020 • Shengyi Huang, Santiago Ontañón

Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward.

Real-Time Strategy Games Reinforcement Learning (RL)

209

Paper
Code

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

2 code implementations • 25 Jun 2020 • Shengyi Huang, Santiago Ontañón

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games.

Real-Time Strategy Games valid

420

Paper
Code

Comparing Observation and Action Representations for Deep Reinforcement Learning in $μ$RTS

3 code implementations • 26 Oct 2019 • Shengyi Huang, Santiago Ontañón

This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games.

reinforcement-learning Reinforcement Learning (RL)

209

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.