Search Results for author: Chuheng Zhang

Found 21 papers, 7 papers with code

Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

1 code implementation11 Sep 2024 Wei Shen, Chuheng Zhang

Reinforcement learning from human feedback (RLHF) is one of the key techniques that helps large language models (LLMs) to follow instructions and provide helpful and harmless responses.

Code Generation HumanEval

Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL

1 code implementation20 Jul 2024 Yunseon Choi, Sangmin Bae, Seonghyun Ban, Minchan Jeong, Chuheng Zhang, Lei Song, Li Zhao, Jiang Bian, Kee-Eung Kim

With the advent of foundation models, prompt tuning has positioned itself as an important technique for directing model behaviors and eliciting desired responses.

Few-Shot Text Classification Q-Learning +5

Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

no code implementations17 Apr 2024 Guangran Cheng, Chuheng Zhang, Wenzhe Cai, Li Zhao, Changyin Sun, Jiang Bian

While large language models (LLMs) are successful in completing various language processing tasks, they easily fail to interact with the physical world by generating control sequences properly.

ARO: Large Language Model Supervised Robotics Text2Skill Autonomous Learning

no code implementations23 Mar 2024 YiWen Chen, Yuyao Ye, Ziyi Chen, Chuheng Zhang, Marcelo H. Ang

Robotics learning highly relies on human expertise and efforts, such as demonstrations, design of reward functions in reinforcement learning, performance evaluation using human feedback, etc.

Language Modelling Large Language Model

Pre-Trained Large Language Models for Industrial Control

no code implementations6 Aug 2023 Lei Song, Chuheng Zhang, Li Zhao, Jiang Bian

2)~How well can GPT-4 generalize to different scenarios for HVAC control?

A Versatile Multi-Agent Reinforcement Learning Benchmark for Inventory Management

1 code implementation13 Jun 2023 Xianliang Yang, Zhihao Liu, Wei Jiang, Chuheng Zhang, Li Zhao, Lei Song, Jiang Bian

Multi-agent reinforcement learning (MARL) models multiple agents that interact and learn within a shared environment.

Autonomous Driving Management +3

Towards Generalizable Reinforcement Learning for Trade Execution

no code implementations12 May 2023 Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao

To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms.

Offline RL reinforcement-learning +2

RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

no code implementations3 Mar 2023 Yuanying Cai, Chuheng Zhang, Wei Shen, Xuyun Zhang, Wenjie Ruan, Longbo Huang

Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory.

Data Augmentation Language Modelling +4

Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management

no code implementations15 Dec 2022 Yuandong Ding, Mingxiao Feng, Guozi Liu, Wei Jiang, Chuheng Zhang, Li Zhao, Lei Song, Houqiang Li, Yan Jin, Jiang Bian

In this paper, we consider the inventory management (IM) problem where we need to make replenishment decisions for a large number of stock keeping units (SKUs) to balance their supply and demand.

Management Multi-agent Reinforcement Learning +3

A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS

no code implementations5 Dec 2022 Wei Shen, Xiaonan He, Chuheng Zhang, Xuyun Zhang, Jian Xie

Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system.

Spoken Dialogue Systems

TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets

1 code implementation5 Dec 2022 Yuanying Cai, Chuheng Zhang, Li Zhao, Wei Shen, Xuyun Zhang, Lei Song, Jiang Bian, Tao Qin, TieYan Liu

There are two challenges for this setting: 1) The optimal trade-off between optimizing the RL signal and the behavior cloning (BC) signal changes on different states due to the variation of the action coverage induced by different behavior policies.

D4RL Offline RL +2

Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks

no code implementations2 Apr 2022 Ze Wang, Guogang Liao, Xiaowen Shi, Xiaoxu Wu, Chuheng Zhang, Yongkang Wang, Xingxing Wang, Dong Wang

With the recent prevalence of reinforcement learning (RL), there have been tremendous interests in utilizing RL for ads allocation in recommendation platforms (e. g., e-commerce and news feed sites).

Contrastive Learning Reinforcement Learning (RL)

Hybrid Transfer in Deep Reinforcement Learning for Ads Allocation

no code implementations2 Apr 2022 Ze Wang, Guogang Liao, Xiaowen Shi, Xiaoxu Wu, Chuheng Zhang, Bingqi Zhu, Yongkang Wang, Xingxing Wang, Dong Wang

Ads allocation, which involves allocating ads and organic items to limited slots in feed with the purpose of maximizing platform revenue, has become a research hotspot.

Deep Reinforcement Learning reinforcement-learning +1

Deep Page-Level Interest Network in Reinforcement Learning for Ads Allocation

no code implementations1 Apr 2022 Guogang Liao, Xiaowen Shi, Ze Wang, Xiaoxu Wu, Chuheng Zhang, Yongkang Wang, Xingxing Wang, Dong Wang

A mixed list of ads and organic items is usually displayed in feed and how to allocate the limited slots to maximize the overall revenue is a key problem.

Click-Through Rate Prediction reinforcement-learning +1

Cross DQN: Cross Deep Q Network for Ads Allocation in Feed

1 code implementation9 Sep 2021 Guogang Liao, Ze Wang, Xiaoxu Wu, Xiaowen Shi, Chuheng Zhang, Yongkang Wang, Xingxing Wang, Dong Wang

Our model results in higher revenue and better user experience than state-of-the-art baselines in offline experiments.

Inductive Matrix Completion Using Graph Autoencoder

2 code implementations25 Aug 2021 Wei Shen, Chuheng Zhang, Yun Tian, Liang Zeng, Xiaonan He, Wanchun Dou, Xiaolong Xu

However, without node content (i. e., side information) for training, the user (or item) specific representation can not be learned in the inductive setting, that is, a model trained on one group of users (or items) cannot adapt to new users (or items).

Graph Neural Network Matrix Completion

Return-Based Contrastive Representation Learning for Reinforcement Learning

no code implementations ICLR 2021 Guoqing Liu, Chuheng Zhang, Li Zhao, Tao Qin, Jinhua Zhu, Jian Li, Nenghai Yu, Tie-Yan Liu

Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL).

Atari Games Deep Reinforcement Learning +3

DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis

1 code implementation3 Oct 2020 Chuheng Zhang, Yuanqi Li, Xi Chen, Yifei Jin, Pingzhong Tang, Jian Li

Modern machine learning models (such as deep neural networks and boosting decision tree models) have become increasingly popular in financial market prediction, due to their superior capacity to extract complex non-linear patterns.

BIG-bench Machine Learning feature selection

Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework

no code implementations11 Jun 2020 Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment.

Q-Learning Reinforcement Learning (RL)

Policy Search by Target Distribution Learning for Continuous Control

no code implementations27 May 2019 Chuheng Zhang, Yuanqi Li, Jian Li

We observe that several existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may suffer from overly large gradients when the current policy is close to deterministic (even in some very simple environments), leading to an unstable training process.

continuous-control Continuous Control +3

Cannot find the paper you are looking for? You can Submit a new open access paper.