Search Results for author: Zheqing Zhu

Found 13 papers, 2 papers with code

An Empirical Study of Deep Reinforcement Learning in Continuing Tasks

1 code implementation12 Jan 2025 Yi Wan, Dmytro Korenkevych, Zheqing Zhu

In reinforcement learning (RL), continuing tasks refer to tasks where the agent-environment interaction is ongoing and can not be broken down into episodes.

Deep Reinforcement Learning MuJoCo +2

Epinet for Content Cold Start

no code implementations20 Nov 2024 Hong Jun Jeon, Songbin Liu, Yuantong Li, Jie Lyu, Hunter Song, Ji Liu, Peng Wu, Zheqing Zhu

The exploding popularity of online content and its user base poses an evermore challenging matching problem for modern recommendation systems.

Recommendation Systems Thompson Sampling +1

Uncertainty of Joint Neural Contextual Bandit

no code implementations4 Jun 2024 Hongbo Guo, Zheqing Zhu

However, a major challenge arises when implementing a disjoint neural contextual bandit solution in large-scale recommendation systems, where each item or user may correspond to a separate bandit arm.

Recommendation Systems

Offline Reinforcement Learning for Optimizing Production Bidding Policies

no code implementations13 Oct 2023 Dmytro Korenkevych, Frank Cheng, Artsiom Balakir, Alex Nikulkov, Lingnan Gao, Zhihao Cen, Zuobing Xu, Zheqing Zhu

We use a hybrid agent architecture that combines arbitrary base policies with deep neural networks, where only the optimized base policy parameters are eventually deployed, and the neural network part is discarded after training.

reinforcement-learning Reinforcement Learning

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

no code implementations11 Oct 2023 Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy

Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends.

Multi-Armed Bandits

Scalable Neural Contextual Bandit for Recommender Systems

no code implementations26 Jun 2023 Zheqing Zhu, Benjamin Van Roy

In two distinct large-scale experiments with real-world tasks, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms.

Recommendation Systems Thompson Sampling

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

no code implementations1 Jun 2023 Rohan Chitnis, Yingchen Xu, Bobak Hashemi, Lucas Lehnert, Urun Dogan, Zheqing Zhu, Olivier Delalleau

Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset.

D4RL Model-based Reinforcement Learning +4

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

no code implementations23 May 2023 Ruiyang Xu, Jalaj Bhandari, Dmytro Korenkevych, Fan Liu, Yuchen He, Alex Nikulkov, Zheqing Zhu

Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior.

Recommendation Systems reinforcement-learning +1

Deep Exploration for Recommendation Systems

no code implementations26 Sep 2021 Zheqing Zhu, Benjamin Van Roy

Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback.

Recommendation Systems Thompson Sampling

Multi-Agent Safe Planning with Gaussian Processes

no code implementations10 Aug 2020 Zheqing Zhu, Erdem Biyik, Dorsa Sadigh

Multi-agent safe systems have become an increasingly important area of study as we can now easily have multiple AI-powered systems operating together.

Gaussian Processes

Cannot find the paper you are looking for? You can Submit a new open access paper.