In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of a conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.
In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong.
In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective.
Listwise ranking losses have been widely studied in recommender systems.
Industrial recommender systems are frequently tasked with approximating probabilities for multiple, often closely related, user actions.