no code implementations • ICML 2020 • Umer Siddique, Paul Weng, Matthieu Zimmer
During this analysis, we notably derive a new result in the standard RL setting, which is of independent interest: it states a novel bound on the approximation error with respect to the optimal average reward of that of a policy optimal for the discounted reward.
no code implementations • 16 Jun 2023 • Umer Siddique, Abhinav Sinha, Yongcan Cao
Toward this objective, we design a new fairness-induced preference-based reinforcement learning or FPbRL.
3 code implementations • 17 Dec 2020 • Matthieu Zimmer, Claire Glanois, Umer Siddique, Paul Weng
As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account the two aspects of fairness.
1 code implementation • 18 Aug 2020 • Umer Siddique, Paul Weng, Matthieu Zimmer
Since learning with discounted rewards is generally easier, this discussion further justifies finding a fair policy for the average reward by learning a fair policy for the discounted reward.