no code implementations • 18 Nov 2022 • Hideitsu Hino, Shinto Eguchi
In this paper, the measure of disagreement is defined by the Bregman divergence, which includes the Kullback--Leibler divergence as an instance, and the dual $\gamma$-power divergence.
no code implementations • 16 Nov 2022 • Shinto Eguchi
In a standard framework of reinforcement learning, a Q-function is defined as the conditional expectation of a reward given a state and an action for a single-stage situation.