no code implementations • 22 Nov 2023 • Zhengqi Wu, Renyuan Xu
In this paper, we consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a Markov decision process (MDP).
Reinforcement Learning (RL)