no code implementations • 1 Mar 2023 • Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane
In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays.
no code implementations • 13 Nov 2022 • Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane
In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property.