1 code implementation • 12 Jan 2025 • Yi Wan, Dmytro Korenkevych, Zheqing Zhu
In reinforcement learning (RL), continuing tasks refer to tasks where the agent-environment interaction is ongoing and can not be broken down into episodes.
no code implementations • 20 Nov 2024 • Hong Jun Jeon, Songbin Liu, Yuantong Li, Jie Lyu, Hunter Song, Ji Liu, Peng Wu, Zheqing Zhu
The exploding popularity of online content and its user base poses an evermore challenging matching problem for modern recommendation systems.
no code implementations • 1 Oct 2024 • Wenhao Zhan, Scott Fujimoto, Zheqing Zhu, Jason D. Lee, Daniel R. Jiang, Yonathan Efroni
We study the problem of learning an approximate equilibrium in the offline multi-agent reinforcement learning (MARL) setting.
no code implementations • 4 Jun 2024 • Hongbo Guo, Zheqing Zhu
However, a major challenge arises when implementing a disjoint neural contextual bandit solution in large-scale recommendation systems, where each item or user may correspond to a separate bandit arm.
1 code implementation • 6 Dec 2023 • Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu
Reinforcement learning (RL) is a versatile framework for optimizing long-term goals.
no code implementations • 13 Oct 2023 • Dmytro Korenkevych, Frank Cheng, Artsiom Balakir, Alex Nikulkov, Lingnan Gao, Zhihao Cen, Zuobing Xu, Zheqing Zhu
We use a hybrid agent architecture that combines arbitrary base policies with deep neural networks, where only the optimized base policy parameters are eventually deployed, and the neural network part is discarded after training.
no code implementations • 11 Oct 2023 • Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy
Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends.
no code implementations • 26 Jun 2023 • Zheqing Zhu, Benjamin Van Roy
In two distinct large-scale experiments with real-world tasks, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms.
no code implementations • 1 Jun 2023 • Rohan Chitnis, Yingchen Xu, Bobak Hashemi, Lucas Lehnert, Urun Dogan, Zheqing Zhu, Olivier Delalleau
Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset.
no code implementations • 23 May 2023 • Ruiyang Xu, Jalaj Bhandari, Dmytro Korenkevych, Fan Liu, Yuchen He, Alex Nikulkov, Zheqing Zhu
Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior.
no code implementations • 5 Apr 2023 • Hongbo Guo, Ruben Naeff, Alex Nikulkov, Zheqing Zhu
Bandit learning has been an increasingly popular design choice for recommender system.
no code implementations • 26 Sep 2021 • Zheqing Zhu, Benjamin Van Roy
Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback.
no code implementations • 10 Aug 2020 • Zheqing Zhu, Erdem Biyik, Dorsa Sadigh
Multi-agent safe systems have become an increasingly important area of study as we can now easily have multiple AI-powered systems operating together.