We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space.
In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.
In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong.
In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective.
Listwise ranking losses have been widely studied in recommender systems.
Industrial recommender systems are frequently tasked with approximating probabilities for multiple, often closely related, user actions.