Search Results for author: Shengyi Huang

Found 12 papers, 11 papers with code

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

1 code implementation24 Mar 2024 Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.

reinforcement-learning

Zephyr: Direct Distillation of LM Alignment

1 code implementation25 Oct 2023 Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf

Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.

2D Cyclist Detection Language Modelling

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

1 code implementation29 Sep 2023 Shengyi Huang, Jiayi Weng, Rujikorn Charakorn, Min Lin, Zhongwen Xu, Santiago Ontañón

Distributed Deep Reinforcement Learning (DRL) aims to leverage more computational resources to train autonomous agents with less training time.

reinforcement-learning

A2C is a special case of PPO

1 code implementation18 May 2022 Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago Ontañón, Rousslan Fernand Julien Dossa

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years.

reinforcement-learning Reinforcement Learning (RL)

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

2 code implementations16 Nov 2021 Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga

CleanRL is an open-source library that provides high-quality single-file implementations of Deep Reinforcement Learning algorithms.

Benchmarking reinforcement-learning +2

Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning

4 code implementations21 May 2021 Shengyi Huang, Santiago Ontañón, Chris Bamford, Lukasz Grela

In recent years, researchers have achieved great success in applying Deep Reinforcement Learning (DRL) algorithms to Real-time Strategy (RTS) games, creating strong autonomous agents that could defeat professional players in StarCraft~II.

reinforcement-learning Reinforcement Learning (RL) +2

Griddly: A platform for AI research in games

no code implementations12 Nov 2020 Chris Bamford, Shengyi Huang, Simon Lucas

In recent years, there have been immense breakthroughs in Game AI research, particularly with Reinforcement Learning (RL).

Reinforcement Learning (RL)

Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games

2 code implementations5 Oct 2020 Shengyi Huang, Santiago Ontañón

Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward.

Real-Time Strategy Games Reinforcement Learning (RL)

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

2 code implementations25 Jun 2020 Shengyi Huang, Santiago Ontañón

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games.

Real-Time Strategy Games valid

Comparing Observation and Action Representations for Deep Reinforcement Learning in $μ$RTS

3 code implementations26 Oct 2019 Shengyi Huang, Santiago Ontañón

This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.