1 code implementation • 27 Oct 2024 • Jing Zhang, Linjiajie Fang, Kexin Shi, Wenjia Wang, Bing-Yi Jing
A learning policy may take actions beyond the behavior policy's knowledge, referred to as Out-of-Distribution (OOD) actions.
1 code implementation • 31 May 2024 • Linjiajie Fang, Ruoxue Liu, Jing Zhang, Wenjia Wang, Bing-Yi Jing
In this paper, we propose Diffusion Actor-Critic (DAC) that formulates the Kullback-Leibler (KL) constraint policy iteration as a diffusion noise regression problem, enabling direct representation of target policies as diffusion models.
no code implementations • 28 Mar 2024 • Kexin Shi, Jing Zhang, Linjiajie Fang, Wenjia Wang, BingYi Jing
In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning.