no code implementations • 5 Aug 2023 • Hao Wang, Jianxun Lian, Mingqi Wu, Haoxuan Li, Jiajun Fan, Wanyue Xu, Chaozhuo Li, Xing Xie
Sequential user modeling, a critical task in personalized recommender systems, focuses on predicting the next item a user would prefer, requiring a deep understanding of user behavior sequences.
no code implementations • 9 May 2023 • Jiajun Fan, Yuzheng Zhuang, Yuecheng Liu, Jianye Hao, Bin Wang, Jiangcheng Zhu, Hao Wang, Shu-Tao Xia
The exploration problem is one of the main challenges in deep reinforcement learning (RL).
Ranked #1 on Atari Games on Atari-57
no code implementations • 20 Oct 2022 • Hao Wang, Zhichao Chen, Jiajun Fan, Yuxin Huang, Weiming Liu, Xinggao Liu
As a basic research problem for building effective recommender systems, post-click conversion rate (CVR) estimation has long been plagued by sample selection bias and data sparsity issues.
no code implementations • 7 Jun 2022 • Jiajun Fan, Changnan Xiao
Then, we cast these two problems into the training data distribution optimization problem, namely to obtain desired training data within limited interactions, and address them concurrently via i) explicit modeling and control of the capacity and diversity of behavior policy and ii) more fine-grained and adaptive control of selective/sampling distribution of the behavior policy using a monotonic data distribution optimization.
Ranked #1 on Atari Games on atari game
no code implementations • 8 Dec 2021 • Jiajun Fan
From Deep Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance in ALE.
no code implementations • 11 Jun 2021 • Jiajun Fan, Changnan Xiao, Yue Huang
Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process.
Ranked #1 on Atari Games on Atari 2600 Freeway
no code implementations • 1 Jun 2021 • Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng
We find valued-based reinforcement learning methods with {\epsilon}-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem.
no code implementations • 9 May 2021 • Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, Haiyan Yin
We study the problem of model-free reinforcement learning, which is often solved following the principle of Generalized Policy Iteration (GPI).
no code implementations • 13 Nov 2020 • Jiajun Fan, He Ba, Xian Guo, Jianye Hao
Extensive experiments demonstrate that Critic PI2 achieved a new state of the art in a range of challenging continuous domains.