no code implementations • 26 Jan 2021 • Tobias Joppen, Johannes Fürnkranz
In this paper we take a look at MCTS, a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show that an ordinal treatment of the rewards overcomes this problem.
no code implementations • 31 May 2019 • Tobias Joppen, Tilman Strübig, Johannes Fürnkranz
In this paper, we present a simple and cheap ordinal bucketing algorithm that approximately generates $q$-quantiles from an incremental data stream.
1 code implementation • 6 May 2019 • Alexander Zap, Tobias Joppen, Johannes Fürnkranz
Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties.
2 code implementations • 14 Jan 2019 • Tobias Joppen, Johannes Fürnkranz
In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions.
no code implementations • 17 Jul 2018 • Tobias Joppen, Christian Wirth, Johannes Fürnkranz
To deal with such cases, the experimenter has to supply an additional numeric feedback signal in the form of a heuristic, which intrinsically guides the agent.